INN Hotels Project¶
Context¶
A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.
The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.
The cancellation of bookings impact a hotel on various fronts:
- Loss of resources (revenue) when the hotel cannot resell the room.
- Additional costs of distribution channels by increasing commissions or paying for publicity to help sell these rooms.
- Lowering prices last minute, so the hotel can resell a room, resulting in reducing the profit margin.
- Human resources to make arrangements for the guests.
Objective¶
The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. INN Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.
Data Description¶
The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.
Data Dictionary
- Booking_ID: unique identifier of each booking
- no_of_adults: Number of adults
- no_of_children: Number of Children
- no_of_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel
- no_of_week_nights: Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel
- type_of_meal_plan: Type of meal plan booked by the customer:
- Not Selected – No meal plan selected
- Meal Plan 1 – Breakfast
- Meal Plan 2 – Half board (breakfast and one other meal)
- Meal Plan 3 – Full board (breakfast, lunch, and dinner)
- required_car_parking_space: Does the customer require a car parking space? (0 - No, 1- Yes)
- room_type_reserved: Type of room reserved by the customer. The values are ciphered (encoded) by INN Hotels.
- lead_time: Number of days between the date of booking and the arrival date
- arrival_year: Year of arrival date
- arrival_month: Month of arrival date
- arrival_date: Date of the month
- market_segment_type: Market segment designation.
- repeated_guest: Is the customer a repeated guest? (0 - No, 1- Yes)
- no_of_previous_cancellations: Number of previous bookings that were canceled by the customer prior to the current booking
- no_of_previous_bookings_not_canceled: Number of previous bookings not canceled by the customer prior to the current booking
- avg_price_per_room: Average price per day of the reservation; prices of the rooms are dynamic. (in euros)
- no_of_special_requests: Total number of special requests made by the customer (e.g. high floor, view from the room, etc)
- booking_status: Flag indicating if the booking was canceled or not.
Summary¶
Actionable Insights and Recommendations for INN Hotels¶
Cancellation and Refund Policy Optimizationon¶
1. Introduce Non-Refundable Booking Options¶
- Guests booking early (
lead_time) and throughOnlinechannels are more likely to cancel. - Offer discounted non-refundable rates to secure revenue upfront and reduce cancellation risk.
2. Flexible Refunds for Short Lead Time Bookings¶
- Guests with shorter
lead_timehave lower cancellation probability. - Allow full or partial refunds for bookings made less than 15 days i
3. Monitor and Encourage Special Requests¶
- Guests with 0–2 special requests show a higher likelihood of cancellation, especially those with zero requests (over 40% cancellation rate).
- Conversely, guests with 3 or more requests almost never cancel.
- Recommendation:
- Encourage guests to submit special requests during booking (e.g., preferences for room setup, amenities).
- Use the number of requests as a positive signal of booking commitment rather than a cancellation risk.
- Avoid penalizing or flagging high-request bookings—they are highly likely to show up.iple
Other Strategic Recommendationsic Recommendations¶
4. Optimize Online Booking Channels¶
market_segment_type_Onlineis a strong predictor of cancellations.- Encourage direct bookings by:
- Offering exclusive perks or discounts via the hotel’s website.
- Displaying clear cancellation policies during checkout.
5. Dynamic Pricing for High-Risk Segments¶
- Long
lead_time, noparking, and lowavg_price_per_roomcorrelate with higher cancellation risk. - Raise prices or require partial prepayment for bookings matching this profile.
6. Targeted Offers for Reliable Segments¶
- Guests with short
lead_timeand fewer requests are more likely to show up. - Incentivize these custo
Model Insights¶
Pre-Pruned Decision Tree performed best: F1 train - 0.758779, F1 test - 0.754676
Post-Pruned Decision Tree model looks overfitted: F1 train - 0.913977, F1 test - 0.814270
Top Features Contributing to Predictions:
lead_timemarket_segment_type_Onlineno_of_special_requestsavg_price_per_roomshould inform policy design and marketing strategy.
Importing necessary libraries and data¶
# Installing the libraries with the specified version.
#!pip install pandas==1.5.3 numpy==1.25.2 matplotlib==3.7.1 seaborn==0.13.1 scikit-learn==1.2.2 statsmodels==0.14.1 -q --user
Note: After running the above cell, kindly restart the notebook kernel and run all cells sequentially from the start again.
# this will help in making the Python code more structured automatically (good coding practice)
#%load_ext nb_black
# Library to suppress warnings or deprecation notes
import warnings
warnings.filterwarnings("ignore")
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# Library to split data
from sklearn.model_selection import train_test_split
# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# To build model for prediction
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
from sklearn.linear_model import LogisticRegression
# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
# Libraries to build decision tree classifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
# To tune different models
from sklearn.model_selection import GridSearchCV
# To perform statistical analysis
import scipy.stats as stats
# To get diferent metric scores
from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
confusion_matrix,
make_scorer,
roc_auc_score,
roc_curve,
precision_recall_curve
)
#from sklearn.metrics import plot_confusion_matrix
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
data = pd.read_csv("INNHotelsGroup.csv")
'''from google.colab import drive
drive.mount('/content/drive')
file_path = '/content/drive/MyDrive/Austin/2.5 Project Supervised Learning - Classification INN Hotels/INNHotelsGroup.csv'
data = pd.read_csv(file_path)'''
"from google.colab import drive\ndrive.mount('/content/drive')\nfile_path = '/content/drive/MyDrive/Austin/2.5 Project Supervised Learning - Classification INN Hotels/INNHotelsGroup.csv'\ndata = pd.read_csv(file_path)"
# copying data to another varaible to avoid any changes to original data
df = data.copy()
Data Overview¶
- Observations
- Sanity checks
View the first and last 5 rows of the dataset.¶
df.head()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
| 1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled |
| 2 | INN00003 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled |
| 3 | INN00004 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled |
| 4 | INN00005 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
df.tail()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 36270 | INN36271 | 3 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 4 | 85 | 2018 | 8 | 3 | Online | 0 | 0 | 0 | 167.80 | 1 | Not_Canceled |
| 36271 | INN36272 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 228 | 2018 | 10 | 17 | Online | 0 | 0 | 0 | 90.95 | 2 | Canceled |
| 36272 | INN36273 | 2 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 148 | 2018 | 7 | 1 | Online | 0 | 0 | 0 | 98.39 | 2 | Not_Canceled |
| 36273 | INN36274 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 63 | 2018 | 4 | 21 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
| 36274 | INN36275 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 207 | 2018 | 12 | 30 | Offline | 0 | 0 | 0 | 161.67 | 0 | Not_Canceled |
Understand the shape of the dataset.¶
df.shape
(36275, 19)
- The dataset has 36275 rows and 19 columns of data
Check the data types of the columns for the dataset.¶
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Booking_ID 36275 non-null object 1 no_of_adults 36275 non-null int64 2 no_of_children 36275 non-null int64 3 no_of_weekend_nights 36275 non-null int64 4 no_of_week_nights 36275 non-null int64 5 type_of_meal_plan 36275 non-null object 6 required_car_parking_space 36275 non-null int64 7 room_type_reserved 36275 non-null object 8 lead_time 36275 non-null int64 9 arrival_year 36275 non-null int64 10 arrival_month 36275 non-null int64 11 arrival_date 36275 non-null int64 12 market_segment_type 36275 non-null object 13 repeated_guest 36275 non-null int64 14 no_of_previous_cancellations 36275 non-null int64 15 no_of_previous_bookings_not_canceled 36275 non-null int64 16 avg_price_per_room 36275 non-null float64 17 no_of_special_requests 36275 non-null int64 18 booking_status 36275 non-null object dtypes: float64(1), int64(13), object(5) memory usage: 5.3+ MB
Observations -
- a lot of columns' type - integer.
- Booking_ID - object - don't need it
Summary of the dataset.¶
df.describe(include="all")
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 36275 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275 | 36275.000000 | 36275 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275 |
| unique | 36275 | NaN | NaN | NaN | NaN | 4 | NaN | 7 | NaN | NaN | NaN | NaN | 5 | NaN | NaN | NaN | NaN | NaN | 2 |
| top | INN00001 | NaN | NaN | NaN | NaN | Meal Plan 1 | NaN | Room_Type 1 | NaN | NaN | NaN | NaN | Online | NaN | NaN | NaN | NaN | NaN | Not_Canceled |
| freq | 1 | NaN | NaN | NaN | NaN | 27835 | NaN | 28130 | NaN | NaN | NaN | NaN | 23214 | NaN | NaN | NaN | NaN | NaN | 24390 |
| mean | NaN | 1.844962 | 0.105279 | 0.810724 | 2.204300 | NaN | 0.030986 | NaN | 85.232557 | 2017.820427 | 7.423653 | 15.596995 | NaN | 0.025637 | 0.023349 | 0.153411 | 103.423539 | 0.619655 | NaN |
| std | NaN | 0.518715 | 0.402648 | 0.870644 | 1.410905 | NaN | 0.173281 | NaN | 85.930817 | 0.383836 | 3.069894 | 8.740447 | NaN | 0.158053 | 0.368331 | 1.754171 | 35.089424 | 0.786236 | NaN |
| min | NaN | 0.000000 | 0.000000 | 0.000000 | 0.000000 | NaN | 0.000000 | NaN | 0.000000 | 2017.000000 | 1.000000 | 1.000000 | NaN | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | NaN |
| 25% | NaN | 2.000000 | 0.000000 | 0.000000 | 1.000000 | NaN | 0.000000 | NaN | 17.000000 | 2018.000000 | 5.000000 | 8.000000 | NaN | 0.000000 | 0.000000 | 0.000000 | 80.300000 | 0.000000 | NaN |
| 50% | NaN | 2.000000 | 0.000000 | 1.000000 | 2.000000 | NaN | 0.000000 | NaN | 57.000000 | 2018.000000 | 8.000000 | 16.000000 | NaN | 0.000000 | 0.000000 | 0.000000 | 99.450000 | 0.000000 | NaN |
| 75% | NaN | 2.000000 | 0.000000 | 2.000000 | 3.000000 | NaN | 0.000000 | NaN | 126.000000 | 2018.000000 | 10.000000 | 23.000000 | NaN | 0.000000 | 0.000000 | 0.000000 | 120.000000 | 1.000000 | NaN |
| max | NaN | 4.000000 | 10.000000 | 7.000000 | 17.000000 | NaN | 1.000000 | NaN | 443.000000 | 2018.000000 | 12.000000 | 31.000000 | NaN | 1.000000 | 13.000000 | 58.000000 | 540.000000 | 5.000000 | NaN |
Observations¶
Adults per booking:
- Mean: ~1.84 adults
- Most bookings are for 2 adults (25%, 50%, 75% quantiles = 2)
- Maximum: 4 adults
Children per booking:
- Mean: ~0.1 → most bookings are without children
- Max: 10 children (likely an outlier)
Stay Details
Weekend nights:
- Most bookings have 0–2 weekend nights (median = 1)
Weekday nights:
- Median stay = 2 nights, max = 17 nights
Lead time:
- Mean lead time before arrival: ~85 days
- Max lead time: 443 days, suggesting bookings were made far in advance
Meal Plans
- 4 types of type_of_meal_plan
- Most popular: Meal Plan 1 (used in ~27,835 bookings)
Car Parking
- Only ~3% of bookings requested a parking space (mean = 0.030)
Room Type
- 7 types of room_type_reserved
- Most frequent: Room_Type 1 (≈ 28,130 bookings)
Date of Arrival
- All bookings are from 2017 or 2018
- Median month: August
- Most arrival dates cluster around the middle of the month (median = 16)
Market Segment
- 5 market segment types
- Most common: Online (~23,214 bookings)
Repeat Guests
- Very few are repeat guests (mean ≈ 0.026, i.e. ~2.6%)
Cancellations
- Most guests had no prior cancellations:
- Previous cancellations:
Median: 0, Max: 13
- Previous bookings not canceled:
Median: 0, Max: 58
Price and Requests
- Average room price: 103.42
- Max room price: 540
Special requests:
- Most bookings had 0–1 requests
- Max: 5 requests
Booking Status
- Two classes: Canceled and Not_Canceled
- Most bookings were not canceled (Not_Canceled = ~24,390)
# checking for unique values in ID column
df["Booking_ID"].nunique()
36275
- Since all the values in Booking_ID column are unique we can drop it
df.drop(["Booking_ID"], axis=1, inplace=True)
Check for missing values¶
df.isnull().sum()
no_of_adults 0 no_of_children 0 no_of_weekend_nights 0 no_of_week_nights 0 type_of_meal_plan 0 required_car_parking_space 0 room_type_reserved 0 lead_time 0 arrival_year 0 arrival_month 0 arrival_date 0 market_segment_type 0 repeated_guest 0 no_of_previous_cancellations 0 no_of_previous_bookings_not_canceled 0 avg_price_per_room 0 no_of_special_requests 0 booking_status 0 dtype: int64
- There are no missing values in the dataset
Exploratory Data Analysis (EDA)¶
- EDA is an important part of any project involving data.
- It is important to investigate and understand the data better before building a model with it.
- A few questions have been mentioned below which will help you approach the analysis in the right manner and generate insights from the data.
- A thorough analysis of the data, in addition to the questions mentioned below, should be done.
Leading Questions:
- What are the busiest months in the hotel?
- Which market segment do most of the guests come from?
- Hotel rates are dynamic and change according to demand and customer demographics. What are the differences in room prices in different market segments?
- What percentage of bookings are canceled?
- Repeating guests are the guests who stay in the hotel often and are important to brand equity. What percentage of repeating guests cancel?
- Many guests have special requirements when booking a hotel room. Do these requirements affect booking cancellation?
The below functions need to be defined to carry out the EDA.
# function to create labeled barplots
def labeled_barplot(data, feature, perc=False, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 2, 6))
else:
plt.figure(figsize=(n + 2, 6))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n],
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
# function to plot stacked bar chart
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(count + 5, 6))
plt.legend(
loc="lower left", frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
def histogram_boxplot(data, feature, figsize=(15, 10), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (15,10))
kde: whether to show the density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a triangle will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
### function to plot distributions wrt target
def distribution_plot_wrt_target(data, predictor, target):
fig, axs = plt.subplots(2, 2, figsize=(12, 10))
target_uniq = data[target].unique()
axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))
sns.histplot(
data=data[data[target] == target_uniq[0]],
x=predictor,
kde=True,
ax=axs[0, 0],
color="teal",
stat="density",
)
axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))
sns.histplot(
data=data[data[target] == target_uniq[1]],
x=predictor,
kde=True,
ax=axs[0, 1],
color="orange",
stat="density",
)
axs[1, 0].set_title("Boxplot w.r.t target")
sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")
axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")
sns.boxplot(
data=data,
x=target,
y=predictor,
ax=axs[1, 1],
showfliers=False,
palette="gist_rainbow",
)
plt.tight_layout()
plt.show()
Univariate analysis¶
Observations on no_of_adults¶
labeled_barplot(df, "no_of_adults", perc=True)
- The majority of entries (72.0%) have 2 adults.
- 1 adult is recorded in 21.2% of the cases.
- Entries with 0 adults 0.4% may need further investigation for validity or special cases (e.g., placeholder or missing data).
Observations on no_of_children¶
labeled_barplot(df, "no_of_children", perc=True)
- 92.3% of bookings without children
- 9 & 10 looks like outliers
Observations on no_of_weekend_nights¶
labeled_barplot(df, "no_of_weekend_nights", perc=True)
- 0 weekend nights: 46.5%
- 1 night: 27.6%
- 2 nights: 25.0%
Almost half of bookings are for weekday-only stays (0 weekend nights), which could imply:
- Business travelers
- Budget-conscious guests avoiding higher weekend rates
Observations on no_of_week_nights¶
labeled_barplot(df, "no_of_week_nights", perc=True)
Most frequent value:
- 2 nights is the most common, accounting for 31.5% of the data.
- 1 night: 26.2%
- 3 nights: 21.6%
These three values (1–3 nights) together account for over 79% of the observations
Observations on type_of_meal_plan¶
labeled_barplot(df, "type_of_meal_plan", perc=True)
- Meal Plan 1 is by far the most common choice, selected by 76.7% of customers.
- Not Selected accounts for 14.1%, indicating a notable portion of customers did not opt into a meal plan.
- Meal Plan 2 was selected by 9.1% of customers.
- Meal Plan 3 has 0.0%, suggesting it is either deprecated, unavailable, or possibly an erroneous category.
Meal Plan 1 likely represents the default or most cost-effective option. The "Not Selected" category may require further analysis to understand if this group correlates with specific booking behaviors or customer profiles. If Meal Plan 3 probably is invalid or unused
Observations on required_car_parking_space¶
labeled_barplot(df, "required_car_parking_space", perc=True)
- 96.9% of customers did not require a car parking space (0).
- Only 3.1% of customers requested a car parking space (1).
The demand for parking space is very low. This may suggest:
- Most customers might not be traveling by car.
- The target audience could be urban-based or traveling by public transportation.
Parking space availability may not be a significant factor in customer decisions.
Observations on room_type_reserved¶
labeled_barplot(df, "room_type_reserved", perc=True)
Room_Type 1 is overwhelmingly the most reserved, making up 77.5% of all bookings. Room_Type 4 follows at 16.7%, and the remaining types each account for less than 3%
The dataset is heavily imbalanced in terms of room types, with Room_Type 1 dominating. Such a skew suggests that Room_Type 1 might be the default or most economical option, or it could be overbooked or overrepresented. Rare categories (Types 3, 5, 7) may be:
- Specialty rooms
- Newly added/retired room types
- Errors or edge cases
Observations on lead_time¶
histogram_boxplot(df, "lead_time")
Boxplot Insights (Top Plot):
- The distribution is right-skewed (positively skewed).
- The median lead time lies well below the center, suggesting most bookings are made with shorter notice.
- Thre are many outliers on the higher end (beyond ~250 days), indicating some bookings are made extremely early.
- The box itself is wide, showing considerable variability among typical bookings.
Histogram Insights (Bottom Plot):
- The most frequent lead time is very short (close to 0–10 days), suggesting many last-minute bookings.
- There’s a steady decline in frequency as lead time increases.
- A green dashed line likely marks the mean lead time, which is noticeably right of the median — another sign of skewness.
- A vertical black line might represent the median, showing the data’s asymmetry.
Summary:
- Most bookings are made within the first 0–50 days before check-in.
- Long lead times are less common but still present in significant numbers.
- Outliers and skewness should be considered when modeling or using this variable, especially with linear models.
Observations on arrival_year¶
labeled_barplot(df, "arrival_year", perc=True)
- 2018 accounts for 82% of the bookings.
- 2017 accounts for only 18% of the bookings.
The dataset is heavily skewed toward 2018, which could suggest:
- Increased customer volume in 2018.
- More complete data collection in 2018.
- A partial dataset for 2017 or initial rollout.
Observations on arrival_month¶
labeled_barplot(df, "arrival_month", perc=True)
October (Month 10) is the peak month, accounting for 14.7% of arrivals.
Followed by:
- September (12.7%)
- August (10.5%)
- June (8.8%)
The lowest booking month is January (2.8%). There is a clear seasonal trend, with late summer to early fall (Aug–Oct) being the busiest period. Winter months (Jan–Mar) see significantly fewer arrivals.
Observations on arrival_date¶
labeled_barplot(df, "arrival_date", perc=True)
- The distribution is fairly uniform across most days of the month.
- The 13th, 17th, 2nd, and 4th each account for 3.7% of bookings — the most frequent arrival dates.
- The 31st has the lowest percentage (1.6%), likely due to fewer months having 31 days.
- The days from the 22nd to 27th also show a slight drop, especially 23rd (2.7%) and 22nd (2.8%).
Observations on market_segment_type¶
labeled_barplot(df, "market_segment_type", perc=True)
- Online (64%) and Offline (29%) dominate.
- Corporate (5.6%) is the only other notable category.
- Very few from Complementary or Aviation segments.
Observations on repeated_guest¶
labeled_barplot(df, "repeated_guest", perc=True)
97.4% of the guests are first-time visitors (0). Only 2.6% are repeated guests (1).
The hotel attracts mostly new customers — effective marketing or location-based appeal.
Low return rate could indicate:
- It's a one-time destination (e.g., resort).
- Potential to improve customer retention or loyalty programs.
Observations on no_of_previous_cancellations¶
labeled_barplot(df, "no_of_previous_cancellations", perc=True)
99.1% of guests have no prior cancellations. Only a very small fraction (less than 1%) have canceled one or more times before.
Observations on avg_price_per_room¶
histogram_boxplot(df, "avg_price_per_room")
Boxplot Observations (Top Plot): Many outliers on the higher end (right side), indicating some bookings have significantly higher prices than most others. The median (vertical line in the box) is around 100, while the interquartile range (IQR) spans approximately 75–125. The box is relatively centered in the lower part of the price range, again hinting at right skewness.
Histogram Observations (Bottom Plot): The distribution is right-skewed: most values are concentrated between 50 and 150, with a long tail extending beyond 300. There's a peak around 100, which is close to the mean and median, both marked with vertical lines A spike at 0 suggests some zero-priced entries, possibly due to:
- Complimentary stays
- Data entry errors
- Promotions or reward redemptions
Observations on no_of_special_requests¶
labeled_barplot(df, "no_of_special_requests", perc=True)
- Over half (54.5%) of guests did not make any special requests.
- 31.4% made one special request, while 12% made two.
Observations on booking_status¶
labeled_barplot(df, "booking_status", perc=True)
- Not_Canceled bookings constitute 67.2% of the data.
- Canceled bookings make up the remaining 32.8%.
- The chart indicates a higher rate of successful bookings compared to cancellations.
Bivariate Analysis¶
no_of_adults vs booking_status¶
stacked_barplot(df, "no_of_adults", "booking_status")
booking_status Canceled Not_Canceled All no_of_adults All 11885 24390 36275 2 9119 16989 26108 1 1856 5839 7695 3 863 1454 2317 0 44 95 139 4 3 13 16 ------------------------------------------------------------------------------------------------------------------------
Cancellation Rate Decreases as the number of adults increases:
- For 0–2 adults: Cancellations are ~35–37%.
- For 3 adults: Cancellations drop slightly.
- For 4 adults: Cancellation rate is under 20%, though the sample size is very small.
- The highest proportion of cancellations occurs with bookings for 0–2 adults, which make up the bulk of all bookings.
- The lowest cancellation rate is observed when there are 4 adults, but due to the tiny sample size (n=16), this might not be statistically significant.
- The trend might suggest that bookings with more adults (potentially families or group travelers) are more likely to follow through.
no_of_children vs booking_status¶
stacked_barplot(df, "no_of_children", "booking_status")
booking_status Canceled Not_Canceled All no_of_children All 11885 24390 36275 0 10882 22695 33577 1 540 1078 1618 2 457 601 1058 3 5 14 19 9 1 1 2 10 0 1 1 ------------------------------------------------------------------------------------------------------------------------
Higher number of children appears correlated with a higher cancellation rate, especially at:
- 2 children: ~43% cancellation
- 9 children: ~53% cancellation — though based on just 19 bookings
- Bookings with 3 or 10 children have low or no cancellations, but due to very low sample size, this is not statistically reliable.
- Zero children bookings (~92.6% of total) show a moderate cancellation rate (comparable to the overall average of 32.8%).
market_segment_type vs booking_status¶
stacked_barplot(df, "market_segment_type", "booking_status")
booking_status Canceled Not_Canceled All market_segment_type All 11885 24390 36275 Online 8475 14739 23214 Offline 3153 7375 10528 Corporate 220 1797 2017 Aviation 37 88 125 Complementary 0 391 391 ------------------------------------------------------------------------------------------------------------------------
- Online bookings have the highest cancellation rate (~36.5%), making them a key area for intervention or further segmentation.
- Corporate bookings show a very low cancellation rate (~11%), suggesting strong commitment—possibly due to business travel policies or agreements.
- Complementary bookings have no cancellations at all, but this segment may represent internal or promotion-related bookings and likely shouldn't be treated like the rest.
- Offline and Aviation segments have moderate cancellation rates (~30%).
no_of_weekend_nights vs booking_status¶
stacked_barplot(df, "no_of_weekend_nights", "booking_status")
booking_status Canceled Not_Canceled All no_of_weekend_nights All 11885 24390 36275 0 5093 11779 16872 1 3432 6563 9995 2 3157 5914 9071 4 83 46 129 3 74 79 153 5 29 5 34 6 16 4 20 7 1 0 1 ------------------------------------------------------------------------------------------------------------------------
- For 0–2 weekend nights (which make up 98% of the data), cancellation rates are consistent with the dataset's overall average (~30–35%).
- As the number of weekend nights increases beyond 3, cancellation rates rise sharply:
- 4 nights: over 60% cancellation rate
- 5–7 nights: over 80% cancellations, though these categories have very few bookings
no_of_week_nights vs booking_status¶
stacked_barplot(df, "no_of_week_nights", "booking_status")
booking_status Canceled Not_Canceled All no_of_week_nights All 11885 24390 36275 2 3997 7447 11444 3 2574 5265 7839 1 2572 6916 9488 4 1143 1847 2990 0 679 1708 2387 5 632 982 1614 6 88 101 189 10 53 9 62 7 52 61 113 8 32 30 62 9 21 13 34 11 14 3 17 15 8 2 10 12 7 2 9 13 5 0 5 14 4 3 7 16 2 0 2 17 2 1 3 ------------------------------------------------------------------------------------------------------------------------
- Shorter stays (0–2 nights) are the most frequent and have lower cancellation rates (~30–35%).
- Moderate stays (3–5 nights) show an increasing trend in cancellations.
Longer weeknight stays (6+ nights) tend to have much higher cancellation rates, especially:
- 7+ nights: over 60%
- 13–17 nights: cancellation rate approaches or reaches 100%, but with extremely low counts.
arrival_year vs booking_status¶
stacked_barplot(df, "arrival_year", "booking_status")
booking_status Canceled Not_Canceled All arrival_year All 11885 24390 36275 2018 10924 18837 29761 2017 961 5553 6514 ------------------------------------------------------------------------------------------------------------------------
- The 2018 bar shows a significantly higher proportion of cancellations compared to 2017.
- In 2017, the overwhelming majority of bookings were completed (i.e., not canceled).
- The change in cancellation behavior across years is visually and numerically substantial.
arrival_month vs booking_status¶
stacked_barplot(df, "arrival_month", "booking_status")
booking_status Canceled Not_Canceled All arrival_month All 11885 24390 36275 10 1880 3437 5317 9 1538 3073 4611 8 1488 2325 3813 7 1314 1606 2920 6 1291 1912 3203 4 995 1741 2736 5 948 1650 2598 11 875 2105 2980 3 700 1658 2358 2 430 1274 1704 12 402 2619 3021 1 24 990 1014 ------------------------------------------------------------------------------------------------------------------------
- Summer months (June–August) see the highest cancellation rates, especially July (approaching 45%).
- Winter months (December, January) show very low cancellation rates, with January at only ~2%, though this may be based on limited data.
- Moderate cancellation rates are observed in spring (March–May) and fall (September–November).
arrival_date vs booking_status¶
stacked_barplot(df, "arrival_date", "booking_status")
booking_status Canceled Not_Canceled All arrival_date All 11885 24390 36275 15 538 735 1273 4 474 853 1327 16 473 833 1306 30 465 751 1216 1 465 668 1133 12 460 744 1204 17 448 897 1345 6 444 829 1273 26 425 721 1146 19 413 914 1327 20 413 868 1281 13 408 950 1358 28 405 724 1129 3 403 695 1098 25 395 751 1146 21 376 782 1158 24 372 731 1103 18 366 894 1260 7 364 746 1110 8 356 842 1198 22 351 672 1023 23 341 649 990 29 334 856 1190 11 330 768 1098 5 328 826 1154 14 327 915 1242 10 318 771 1089 27 313 746 1059 2 308 1023 1331 9 294 836 1130 31 178 400 578 ------------------------------------------------------------------------------------------------------------------------
room_type_reserved vs booking_status¶
stacked_barplot(df, "room_type_reserved", "booking_status")
booking_status Canceled Not_Canceled All room_type_reserved All 11885 24390 36275 Room_Type 1 9072 19058 28130 Room_Type 4 2069 3988 6057 Room_Type 6 406 560 966 Room_Type 2 228 464 692 Room_Type 5 72 193 265 Room_Type 7 36 122 158 Room_Type 3 2 5 7 ------------------------------------------------------------------------------------------------------------------------
- Room_Type 6 shows the highest cancellation rate (over 45%), followed by Room_Type 4.
- Room_Type 1, the most frequently booked, has a cancellation rate consistent with the overall average.
- Room_Types 5 and 3 show lower cancellation rates, potentially indicating more reliable bookings.
- Room_Types 5 and 7 have low booking volume — conclusions should be interpreted cautiously.
type_of_meal_plan vs booking_status¶
stacked_barplot(df, "type_of_meal_plan", "booking_status")
booking_status Canceled Not_Canceled All type_of_meal_plan All 11885 24390 36275 Meal Plan 1 8679 19156 27835 Not Selected 1699 3431 5130 Meal Plan 2 1506 1799 3305 Meal Plan 3 1 4 5 ------------------------------------------------------------------------------------------------------------------------
- Meal Plan 2 has the highest cancellation rate (~46%), significantly higher than others.
- Meal Plan 1, the most popular plan, shows a moderate and consistent cancellation rate (~31%).
- Guests who didn't select a meal plan show a similar rate to Meal Plan 1.
- Meal Plan 3 has very few bookings (n=5), so its low cancellation rate is not statistically meaningful.
repeated_guest vs booking_status¶
stacked_barplot(df, "repeated_guest", "booking_status")
booking_status Canceled Not_Canceled All repeated_guest All 11885 24390 36275 0 11869 23476 35345 1 16 914 930 ------------------------------------------------------------------------------------------------------------------------
- Returning guests almost never cancel: Only 16 cancellations out of 930 bookings.
- New guests account for almost all cancellations in the dataset.
no_of_previous_cancellations vs booking_status¶
stacked_barplot(df, "no_of_previous_cancellations", "booking_status")
booking_status Canceled Not_Canceled All no_of_previous_cancellations All 11885 24390 36275 0 11869 24068 35937 1 11 187 198 13 4 0 4 3 1 42 43 2 0 46 46 4 0 10 10 5 0 11 11 6 0 1 1 11 0 25 25 ------------------------------------------------------------------------------------------------------------------------
- The overwhelming majority of bookings had 0 previous cancellations, and this group had a moderate cancellation rate (~33%).
- Guests with 1+ previous cancellations almost never cancel again — this is counterintuitive.
- A single extreme outlier at 13 previous cancellations shows a 100% cancel rate — but it's just one booking.
no_of_special_requests vs booking_status¶
stacked_barplot(df, "no_of_special_requests", "booking_status")
booking_status Canceled Not_Canceled All no_of_special_requests All 11885 24390 36275 0 8545 11232 19777 1 2703 8670 11373 2 637 3727 4364 3 0 675 675 4 0 78 78 5 0 8 8 ------------------------------------------------------------------------------------------------------------------------
There’s a strong inverse relationship between the number of special requests and cancellation rate:
- More requests = lower chance of cancellation
- Guests with 0 requests have the highest cancellation rate (~43%).
Those with 3 or more requests never cancel in this dataset.
required_car_parking_space vs booking_status¶
stacked_barplot(df, "required_car_parking_space", "booking_status")
booking_status Canceled Not_Canceled All required_car_parking_space All 11885 24390 36275 0 11771 23380 35151 1 114 1010 1124 ------------------------------------------------------------------------------------------------------------------------
- Guests who requested parking (1) show a dramatically lower cancellation rate (~10%) compared to those who didn’t (~33.5%).
- The visual clearly shows more commitment among guests requiring parking
plt.figure(figsize=(15, 7))
sns.heatmap(df.corr(numeric_only = True), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral")
plt.show()
#sns.pairplot(df, hue="booking_status")
#plt.show()
avg_price_per_room vs booking_status¶
distribution_plot_wrt_target(df, "avg_price_per_room", "booking_status")
Not_Canceled:
- Long right tail.
- Most frequent room prices are concentrated between 50–125.
- Distribution is slightly wider compared to cancellations.
Canceled:
- Sharper peak around 100–125.
- Less spread in lower price ranges, indicating fewer low-cost bookings are canceled.
- Slightly more bookings at higher price points.
Boxplots:
With outliers:
- Both groups show outliers > 300, but similar IQRs.
- Median price appears slightly higher for canceled bookings.
Without outliers:
- Canceled bookings have higher median and higher upper quartile.
- Suggests customers canceling rooms may be booking higher-priced accommodations.
lead_time vs booking_status¶
distribution_plot_wrt_target(df, "lead_time", "booking_status")
Not_Canceled:
- Highly right-skewed.
- Most bookings occur within 0–50 days before check-in.
- Sharp decline in density as lead time increases.
Canceled:
- Much flatter and wider distribution.
- Cancellations are more common across a broad range of lead times, especially between 50–250 days.
- Noticeable density from 100 to 300 days, showing long-lead bookings are more likely to cancel.
Boxplots:
With outliers:
- Median lead time for canceled bookings is significantly higher than for not canceled.
- Canceled group has a wider spread and more outliers.
Without outliers:
- Median lead time for canceled bookings is well above that of not canceled (~125 vs ~40 days).
- Interquartile range for canceled bookings is also much wider.
Data Preprocessing¶
- Missing value treatment (if needed)
- Feature engineering (if needed)
- Outlier detection and treatment (if needed)
- Preparing data for modeling
- Any other preprocessing steps (if needed)
Missing value treatment¶
df.isnull().sum()
no_of_adults 0 no_of_children 0 no_of_weekend_nights 0 no_of_week_nights 0 type_of_meal_plan 0 required_car_parking_space 0 room_type_reserved 0 lead_time 0 arrival_year 0 arrival_month 0 arrival_date 0 market_segment_type 0 repeated_guest 0 no_of_previous_cancellations 0 no_of_previous_bookings_not_canceled 0 avg_price_per_room 0 no_of_special_requests 0 booking_status 0 dtype: int64
Outlier detection and treatment¶
numerical_col = df.select_dtypes(include=np.number).columns.tolist()
plt.figure(figsize=(20, 30))
for i, variable in enumerate(numerical_col):
plt.subplot(5, 4, i + 1)
plt.boxplot(df[variable], whis=1.5)
plt.tight_layout()
plt.title(variable)
plt.show()
# copying data to another varaible to avoid any changes to original data
data_clean = df.copy()
data_clean.shape
(36275, 18)
data_clean = data_clean[data_clean['no_of_adults'] != 0]
data_clean = data_clean[~data_clean['no_of_children'].isin([9, 10])]
We removed rows containing outliers in the following columns:
no_of_adults: Rows where the number of adults was equal to 0 were removed, as such values are unrealistic for a hotel booking.
no_of_children: Rows where the number of children was equal to 9 or 10 were removed, as these values represent extreme cases and are considered outliers in the context of typical hotel bookings.
This outlier treatment was applied to improve data quality and ensure more reliable model training and analysis.
data_clean.shape
(36133, 18)
Preparing data for modeling¶
Let's encode Canceled bookings to 1 and Not_Canceled as 0 for further analysis
data_clean["booking_status"] = data_clean["booking_status"].apply(
lambda x: 1 if x == "Canceled" else 0
)
data_clean.info()
<class 'pandas.core.frame.DataFrame'> Index: 36133 entries, 0 to 36274 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 36133 non-null int64 1 no_of_children 36133 non-null int64 2 no_of_weekend_nights 36133 non-null int64 3 no_of_week_nights 36133 non-null int64 4 type_of_meal_plan 36133 non-null object 5 required_car_parking_space 36133 non-null int64 6 room_type_reserved 36133 non-null object 7 lead_time 36133 non-null int64 8 arrival_year 36133 non-null int64 9 arrival_month 36133 non-null int64 10 arrival_date 36133 non-null int64 11 market_segment_type 36133 non-null object 12 repeated_guest 36133 non-null int64 13 no_of_previous_cancellations 36133 non-null int64 14 no_of_previous_bookings_not_canceled 36133 non-null int64 15 avg_price_per_room 36133 non-null float64 16 no_of_special_requests 36133 non-null int64 17 booking_status 36133 non-null int64 dtypes: float64(1), int64(14), object(3) memory usage: 5.2+ MB
Creating training and test sets.¶
# specifying the independent and dependent variables
X = data_clean.drop(["booking_status"], axis=1)
Y = data_clean["booking_status"]
# creating dummy variables
X = pd.get_dummies(X, drop_first=True)
X = X.astype(float)
# adding constant
X = sm.add_constant(X)
# splitting in training and test set
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=1)
print("Shape of Training set : ", X_train.shape)
print("Shape of test set : ", X_test.shape)
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Shape of Training set : (25293, 28) Shape of test set : (10840, 28) Percentage of classes in training set: booking_status 0 0.674179 1 0.325821 Name: proportion, dtype: float64 Percentage of classes in test set: booking_status 0 0.667989 1 0.332011 Name: proportion, dtype: float64
X_train.info()
<class 'pandas.core.frame.DataFrame'> Index: 25293 entries, 22036 to 33132 Data columns (total 28 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 const 25293 non-null float64 1 no_of_adults 25293 non-null float64 2 no_of_children 25293 non-null float64 3 no_of_weekend_nights 25293 non-null float64 4 no_of_week_nights 25293 non-null float64 5 required_car_parking_space 25293 non-null float64 6 lead_time 25293 non-null float64 7 arrival_year 25293 non-null float64 8 arrival_month 25293 non-null float64 9 arrival_date 25293 non-null float64 10 repeated_guest 25293 non-null float64 11 no_of_previous_cancellations 25293 non-null float64 12 no_of_previous_bookings_not_canceled 25293 non-null float64 13 avg_price_per_room 25293 non-null float64 14 no_of_special_requests 25293 non-null float64 15 type_of_meal_plan_Meal Plan 2 25293 non-null float64 16 type_of_meal_plan_Meal Plan 3 25293 non-null float64 17 type_of_meal_plan_Not Selected 25293 non-null float64 18 room_type_reserved_Room_Type 2 25293 non-null float64 19 room_type_reserved_Room_Type 3 25293 non-null float64 20 room_type_reserved_Room_Type 4 25293 non-null float64 21 room_type_reserved_Room_Type 5 25293 non-null float64 22 room_type_reserved_Room_Type 6 25293 non-null float64 23 room_type_reserved_Room_Type 7 25293 non-null float64 24 market_segment_type_Complementary 25293 non-null float64 25 market_segment_type_Corporate 25293 non-null float64 26 market_segment_type_Offline 25293 non-null float64 27 market_segment_type_Online 25293 non-null float64 dtypes: float64(28) memory usage: 5.6 MB
X_train.head()
| const | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 3 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Complementary | market_segment_type_Corporate | market_segment_type_Offline | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 22036 | 1.0 | 2.0 | 0.0 | 0.0 | 1.0 | 0.0 | 55.0 | 2018.0 | 4.0 | 6.0 | 0.0 | 0.0 | 0.0 | 104.00 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| 35385 | 1.0 | 2.0 | 0.0 | 1.0 | 3.0 | 0.0 | 127.0 | 2018.0 | 7.0 | 25.0 | 0.0 | 0.0 | 0.0 | 89.25 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| 194 | 1.0 | 2.0 | 0.0 | 4.0 | 10.0 | 0.0 | 147.0 | 2018.0 | 8.0 | 3.0 | 0.0 | 0.0 | 0.0 | 118.88 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| 6705 | 1.0 | 2.0 | 1.0 | 1.0 | 2.0 | 0.0 | 91.0 | 2018.0 | 5.0 | 13.0 | 0.0 | 0.0 | 0.0 | 140.40 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| 1830 | 1.0 | 2.0 | 0.0 | 1.0 | 2.0 | 0.0 | 19.0 | 2018.0 | 9.0 | 19.0 | 0.0 | 0.0 | 0.0 | 95.00 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
Model Building¶
# defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification_statsmodels(
model, predictors, target, threshold=0.5
):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
# checking which probabilities are greater than threshold
pred_temp = model.predict(predictors) > threshold
# rounding off the above values to get classes
pred = np.round(pred_temp)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)
return df_perf
# defining a function to plot the confusion_matrix of a classification model
def confusion_matrix_statsmodels(model, predictors, target, threshold=0.5):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
y_pred = model.predict(predictors) > threshold
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
Logistic Regression (with statsmodels library)¶
Fitting the Model¶
# fitting logistic regression model
logit = sm.Logit(y_train, X_train.astype(float))
lg = logit.fit(disp=False)
print(lg.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25293
Model: Logit Df Residuals: 25265
Method: MLE Df Model: 27
Date: Sun, 22 Jun 2025 Pseudo R-squ.: 0.3270
Time: 23:51:33 Log-Likelihood: -10745.
converged: False LL-Null: -15964.
Covariance Type: nonrobust LLR p-value: 0.000
========================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------------
const -812.1594 120.546 -6.737 0.000 -1048.425 -575.894
no_of_adults 0.0914 0.039 2.371 0.018 0.016 0.167
no_of_children 0.0970 0.067 1.442 0.149 -0.035 0.229
no_of_weekend_nights 0.1310 0.020 6.569 0.000 0.092 0.170
no_of_week_nights 0.0270 0.012 2.186 0.029 0.003 0.051
required_car_parking_space -1.5312 0.136 -11.249 0.000 -1.798 -1.264
lead_time 0.0156 0.000 58.585 0.000 0.015 0.016
arrival_year 0.4012 0.060 6.717 0.000 0.284 0.518
arrival_month -0.0415 0.007 -6.366 0.000 -0.054 -0.029
arrival_date 0.0016 0.002 0.814 0.415 -0.002 0.005
repeated_guest -2.3550 0.547 -4.304 0.000 -3.427 -1.283
no_of_previous_cancellations 0.1565 0.095 1.656 0.098 -0.029 0.342
no_of_previous_bookings_not_canceled -0.0406 0.086 -0.470 0.638 -0.210 0.129
avg_price_per_room 0.0185 0.001 25.139 0.000 0.017 0.020
no_of_special_requests -1.4908 0.030 -49.033 0.000 -1.550 -1.431
type_of_meal_plan_Meal Plan 2 0.1417 0.067 2.115 0.034 0.010 0.273
type_of_meal_plan_Meal Plan 3 31.7693 5.52e+06 5.76e-06 1.000 -1.08e+07 1.08e+07
type_of_meal_plan_Not Selected 0.1879 0.053 3.526 0.000 0.083 0.292
room_type_reserved_Room_Type 2 -0.3395 0.146 -2.331 0.020 -0.625 -0.054
room_type_reserved_Room_Type 3 -0.0987 1.258 -0.078 0.937 -2.565 2.368
room_type_reserved_Room_Type 4 -0.2824 0.053 -5.287 0.000 -0.387 -0.178
room_type_reserved_Room_Type 5 -0.7717 0.220 -3.502 0.000 -1.204 -0.340
room_type_reserved_Room_Type 6 -0.8480 0.157 -5.394 0.000 -1.156 -0.540
room_type_reserved_Room_Type 7 -1.4010 0.301 -4.649 0.000 -1.992 -0.810
market_segment_type_Complementary -69.7074 1.01e+09 -6.93e-08 1.000 -1.97e+09 1.97e+09
market_segment_type_Corporate -1.1294 0.270 -4.183 0.000 -1.659 -0.600
market_segment_type_Offline -2.0268 0.259 -7.818 0.000 -2.535 -1.519
market_segment_type_Online -0.2408 0.256 -0.940 0.347 -0.743 0.261
========================================================================================================
C:\Users\andre\anaconda3\Lib\site-packages\statsmodels\base\model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
confusion_matrix_statsmodels(lg, X_train, y_train)
print("Training performance:")
model_performance_classification_statsmodels(lg, X_train, y_train)
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.804887 | 0.62456 | 0.736548 | 0.675947 |
Observations
Model Fit Summary¶
- Pseudo R-squared:
0.327– indicates moderate model fit. - Log-Likelihood:
-10745 - LLR p-value:
0.000– model is statistically significant compared to null.
Significant Features (p < 0.05)¶
These variables significantly contribute to the model:
no_of_week_nightslead_timearrival_yeararrival_monthavg_price_per_roomno_of_special_requeststype_of_meal_plan_Not Selectedroom_type_reserved_Room_Type 2room_type_reserved_Room_Type 4room_type_reserved_Room_Type 6market_segment_type_Offline
Multicollinearity¶
Additional Information on VIF¶
Variance Inflation factor: Variance inflation factors measure the inflation in the variances of the regression coefficients estimates due to collinearity that exist among the predictors. It is a measure of how much the variance of the estimated regression coefficient βk is "inflated" by the existence of correlation among the predictor variables in the model.
General Rule of thumb: If VIF is 1 then there is no correlation among the kth predictor and the remaining predictor variables, and hence the variance of β̂k is not inflated at all. Whereas if VIF exceeds 5, we say there is moderate VIF and if it is 10 or exceeding 10, it shows signs of high multi-collinearity. But the purpose of the analysis should dictate which threshold to use.
vif_series = pd.Series(
[variance_inflation_factor(X_train.values, i) for i in range(X_train.shape[1])],
index=X_train.columns,
dtype=float,
)
print("Series before feature selection: \n\n{}\n".format(vif_series))
Series before feature selection: const 3.930676e+07 no_of_adults 1.336501e+00 no_of_children 2.223153e+00 no_of_weekend_nights 1.065835e+00 no_of_week_nights 1.094993e+00 required_car_parking_space 1.034825e+00 lead_time 1.387834e+00 arrival_year 1.418583e+00 arrival_month 1.269876e+00 arrival_date 1.007902e+00 repeated_guest 1.779255e+00 no_of_previous_cancellations 1.335907e+00 no_of_previous_bookings_not_canceled 1.629765e+00 avg_price_per_room 2.044343e+00 no_of_special_requests 1.254177e+00 type_of_meal_plan_Meal Plan 2 1.260908e+00 type_of_meal_plan_Meal Plan 3 1.016220e+00 type_of_meal_plan_Not Selected 1.272762e+00 room_type_reserved_Room_Type 2 1.030092e+00 room_type_reserved_Room_Type 3 1.002993e+00 room_type_reserved_Room_Type 4 1.367139e+00 room_type_reserved_Room_Type 5 1.027521e+00 room_type_reserved_Room_Type 6 2.213321e+00 room_type_reserved_Room_Type 7 1.107853e+00 market_segment_type_Complementary 4.373069e+00 market_segment_type_Corporate 1.705492e+01 market_segment_type_Offline 6.317676e+01 market_segment_type_Online 7.030867e+01 dtype: float64
Extremely high VIF values:
market_segment_type_Online: 70.3 → severe multicollinearitymarket_segment_type_Offline: 6.32market_segment_type_Complementary: 4.73arrival_year: 4.12
Most other features have VIF < 2 → no multicollinearity concern
- Remove or consolidate features with high VIF (e.g., market segment dummies).
Dropping market_segment_type_Online
X_train1 = X_train.drop(["market_segment_type_Online"], axis=1)
vif_series = pd.Series(
[variance_inflation_factor(X_train1.values, i) for i in range(X_train1.shape[1])],
index=X_train1.columns,
dtype=float,
)
print("Series before feature selection: \n\n{}\n".format(vif_series))
Series before feature selection: const 3.921797e+07 no_of_adults 1.319816e+00 no_of_children 2.222874e+00 no_of_weekend_nights 1.065826e+00 no_of_week_nights 1.094984e+00 required_car_parking_space 1.034781e+00 lead_time 1.383938e+00 arrival_year 1.415728e+00 arrival_month 1.268335e+00 arrival_date 1.007860e+00 repeated_guest 1.775096e+00 no_of_previous_cancellations 1.335819e+00 no_of_previous_bookings_not_canceled 1.629337e+00 avg_price_per_room 2.043155e+00 no_of_special_requests 1.249389e+00 type_of_meal_plan_Meal Plan 2 1.260590e+00 type_of_meal_plan_Meal Plan 3 1.016220e+00 type_of_meal_plan_Not Selected 1.270712e+00 room_type_reserved_Room_Type 2 1.030084e+00 room_type_reserved_Room_Type 3 1.002993e+00 room_type_reserved_Room_Type 4 1.360723e+00 room_type_reserved_Room_Type 5 1.027513e+00 room_type_reserved_Room_Type 6 2.213106e+00 room_type_reserved_Room_Type 7 1.107723e+00 market_segment_type_Complementary 1.324340e+00 market_segment_type_Corporate 1.521487e+00 market_segment_type_Offline 1.596809e+00 dtype: float64
logit1 = sm.Logit(y_train, X_train1.astype(float))
lg1 = logit1.fit()
print(lg1.summary())
Warning: Maximum number of iterations has been exceeded.
Current function value: 0.424823
Iterations: 35
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25293
Model: Logit Df Residuals: 25266
Method: MLE Df Model: 26
Date: Sun, 22 Jun 2025 Pseudo R-squ.: 0.3269
Time: 23:51:37 Log-Likelihood: -10745.
converged: False LL-Null: -15964.
Covariance Type: nonrobust LLR p-value: 0.000
========================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------------
const -817.7149 120.378 -6.793 0.000 -1053.651 -581.779
no_of_adults 0.0868 0.038 2.270 0.023 0.012 0.162
no_of_children 0.0962 0.067 1.430 0.153 -0.036 0.228
no_of_weekend_nights 0.1311 0.020 6.573 0.000 0.092 0.170
no_of_week_nights 0.0270 0.012 2.193 0.028 0.003 0.051
required_car_parking_space -1.5298 0.136 -11.239 0.000 -1.797 -1.263
lead_time 0.0156 0.000 58.667 0.000 0.015 0.016
arrival_year 0.4039 0.060 6.770 0.000 0.287 0.521
arrival_month -0.0412 0.007 -6.329 0.000 -0.054 -0.028
arrival_date 0.0016 0.002 0.821 0.412 -0.002 0.005
repeated_guest -2.3314 0.548 -4.254 0.000 -3.406 -1.257
no_of_previous_cancellations 0.1547 0.094 1.637 0.102 -0.031 0.340
no_of_previous_bookings_not_canceled -0.0413 0.086 -0.478 0.633 -0.211 0.128
avg_price_per_room 0.0185 0.001 25.131 0.000 0.017 0.020
no_of_special_requests -1.4919 0.030 -49.105 0.000 -1.551 -1.432
type_of_meal_plan_Meal Plan 2 0.1432 0.067 2.139 0.032 0.012 0.274
type_of_meal_plan_Meal Plan 3 13.2330 517.244 0.026 0.980 -1000.546 1027.013
type_of_meal_plan_Not Selected 0.1855 0.053 3.485 0.000 0.081 0.290
room_type_reserved_Room_Type 2 -0.3398 0.146 -2.333 0.020 -0.625 -0.054
room_type_reserved_Room_Type 3 -0.0993 1.259 -0.079 0.937 -2.566 2.367
room_type_reserved_Room_Type 4 -0.2788 0.053 -5.233 0.000 -0.383 -0.174
room_type_reserved_Room_Type 5 -0.7715 0.220 -3.501 0.000 -1.203 -0.340
room_type_reserved_Room_Type 6 -0.8455 0.157 -5.379 0.000 -1.154 -0.537
room_type_reserved_Room_Type 7 -1.3963 0.301 -4.634 0.000 -1.987 -0.806
market_segment_type_Complementary -21.0468 608.695 -0.035 0.972 -1214.067 1171.973
market_segment_type_Corporate -0.8945 0.103 -8.699 0.000 -1.096 -0.693
market_segment_type_Offline -1.7880 0.052 -34.432 0.000 -1.890 -1.686
========================================================================================================
C:\Users\andre\anaconda3\Lib\site-packages\statsmodels\base\model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
Removing high p-value variables¶
For other attributes present in the data, the p-values are high only for few dummy variables and since only one (or some) of the categorical levels have a high p-value we will drop them iteratively as sometimes p-values change after dropping a variable. So, we'll not drop all variables at once.
Instead, we will do the following repeatedly using a loop:
- Build a model, check the p-values of the variables, and drop the column with the highest p-value.
- Create a new model without the dropped feature, check the p-values of the variables, and drop the column with the highest p-value.
- Repeat the above two steps till there are no columns with p-value > 0.05.
Note: The above process can also be done manually by picking one variable at a time that has a high p-value, dropping it, and building a model again. But that might be a little tedious and using a loop will be more efficient.
# initial list of columns
cols = X_train1.columns.tolist()
# setting an initial max p-value
max_p_value = 1
while len(cols) > 0:
# defining the train set
X_train_aux = X_train1[cols]
# fitting the model
model = sm.Logit(y_train, X_train_aux).fit(disp=False)
# getting the p-values and the maximum p-value
p_values = model.pvalues
max_p_value = max(p_values)
# name of the variable with maximum p-value
feature_with_p_max = p_values.idxmax()
if max_p_value > 0.05:
cols.remove(feature_with_p_max)
else:
break
selected_features = cols
print(selected_features)
C:\Users\andre\anaconda3\Lib\site-packages\statsmodels\base\model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
C:\Users\andre\anaconda3\Lib\site-packages\statsmodels\base\model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
['const', 'no_of_adults', 'no_of_weekend_nights', 'no_of_week_nights', 'required_car_parking_space', 'lead_time', 'arrival_year', 'arrival_month', 'repeated_guest', 'avg_price_per_room', 'no_of_special_requests', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Corporate', 'market_segment_type_Offline']
- The above columns are the significant column.
- We see that the loop has removed the constant also but we will need it to build the logistic regression model. So along with the significant variables we will keep the 'const' column as well.
X_train2 = X_train1[selected_features]
logit2 = sm.Logit(y_train, X_train2.astype(float))
lg2 = logit2.fit(disp=False)
print(lg2.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25293
Model: Logit Df Residuals: 25274
Method: MLE Df Model: 18
Date: Sun, 22 Jun 2025 Pseudo R-squ.: 0.3258
Time: 23:51:39 Log-Likelihood: -10764.
converged: True LL-Null: -15964.
Covariance Type: nonrobust LLR p-value: 0.000
==================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------
const -747.5124 116.529 -6.415 0.000 -975.905 -519.120
no_of_adults 0.0890 0.038 2.334 0.020 0.014 0.164
no_of_weekend_nights 0.1335 0.020 6.701 0.000 0.094 0.173
no_of_week_nights 0.0275 0.012 2.236 0.025 0.003 0.052
required_car_parking_space -1.5387 0.136 -11.313 0.000 -1.805 -1.272
lead_time 0.0157 0.000 60.065 0.000 0.015 0.016
arrival_year 0.3690 0.058 6.390 0.000 0.256 0.482
arrival_month -0.0445 0.006 -6.898 0.000 -0.057 -0.032
repeated_guest -2.1584 0.415 -5.205 0.000 -2.971 -1.346
avg_price_per_room 0.0194 0.001 28.019 0.000 0.018 0.021
no_of_special_requests -1.4858 0.030 -49.098 0.000 -1.545 -1.426
type_of_meal_plan_Not Selected 0.1930 0.053 3.644 0.000 0.089 0.297
room_type_reserved_Room_Type 2 -0.3178 0.145 -2.194 0.028 -0.602 -0.034
room_type_reserved_Room_Type 4 -0.3002 0.053 -5.689 0.000 -0.404 -0.197
room_type_reserved_Room_Type 5 -0.7868 0.220 -3.581 0.000 -1.217 -0.356
room_type_reserved_Room_Type 6 -0.7487 0.116 -6.442 0.000 -0.977 -0.521
room_type_reserved_Room_Type 7 -1.4019 0.294 -4.763 0.000 -1.979 -0.825
market_segment_type_Corporate -0.8891 0.103 -8.667 0.000 -1.090 -0.688
market_segment_type_Offline -1.7580 0.050 -34.933 0.000 -1.857 -1.659
==================================================================================================
Now no categorical feature has p-value greater than 0.05, so we'll consider the features in X_train3 as the final ones and lg3 as final model.
Coefficient Interpretations¶
Coefficient of some levels of education, workclass, and native country are positive an increase in these will lead to increase in chances of a person having <=50K salary.
Coefficient of age, fnlwgt, marital_status, working_hours_per_week, some levels of education, workclass, and native country are negative increase in these will lead to decrease in chances of a person having <=50K salary.
Converting coefficients to odds
- The coefficients ($\beta$s) of the logistic regression model are in terms of $log(odds)$ and to find the odds, we have to take the exponential of the coefficients
- Therefore, $odds = exp(\beta)$
- The percentage change in odds is given as $(exp(\beta) - 1) * 100$
# converting coefficients to odds
odds = np.exp(lg2.params)
# finding the percentage change
perc_change_odds = (np.exp(lg2.params) - 1) * 100
# removing limit from number of columns to display
pd.set_option("display.max_columns", None)
# adding the odds to a dataframe
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index=X_train2.columns).T
| const | no_of_adults | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | repeated_guest | avg_price_per_room | no_of_special_requests | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Corporate | market_segment_type_Offline | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Odds | 0.0 | 1.093082 | 1.142814 | 1.027896 | 0.214652 | 1.015864 | 1.446332 | 0.956431 | 0.115513 | 1.019630 | 0.226320 | 1.212915 | 0.727774 | 0.740641 | 0.455277 | 0.472971 | 0.246122 | 0.411015 | 0.172383 |
| Change_odd% | -100.0 | 9.308172 | 14.281429 | 2.789626 | -78.534835 | 1.586375 | 44.633195 | -4.356921 | -88.448716 | 1.962996 | -77.367961 | 21.291545 | -27.222644 | -25.935922 | -54.472272 | -52.702948 | -75.387831 | -58.898459 | -82.761687 |
- Features like
lead_time,no_of_adults, andmeal plan not selectedincrease the probability of booking. - High negative impact from:
repeated_guest,special_requests,- room types 5–7,
- corporate and offline segments.
These can inform targeted interventions or predictive triggers for your booking pipeline.
Checking performance of the new model¶
Training set performance
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_train2, y_train)
log_reg_model_train_perf = model_performance_classification_statsmodels(
lg2, X_train2, y_train
)
print("Training performance:")
log_reg_model_train_perf
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.804175 | 0.624196 | 0.734857 | 0.675021 |
Test set performance
X_test2 = X_test[list(X_train2.columns)]
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_test2, y_test)
log_reg_model_test_perf = model_performance_classification_statsmodels(
lg2, X_test2, y_test
)
print("Test performance:")
log_reg_model_test_perf
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.805351 | 0.630453 | 0.744178 | 0.682611 |
- The cleaned model (lg2) generalizes better: performance on the test set is strong and slightly better than on the training set.
- No signs of overfitting: metrics are consistent across train/test splits.
- Precision-recall balance improved, making the model more useful in real-world booking prediction scenarios.
Model Performance Improvement¶
- Let's see if the f1_score can be improved further by changing the model threshold
- First, we will check the ROC curve, compute the area under the ROC curve (ROC-AUC), and then use it to find the optimal threshold
- Next, we will check the Precision-Recall curve to find the right balance between precision and recall as our metric of choice is f1_score
ROC Curve and ROC-AUC¶
- ROC-AUC on training set
logit_roc_auc_train = roc_auc_score(y_train, lg2.predict(X_train2))
fpr, tpr, thresholds = roc_curve(y_train, lg2.predict(X_train2))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
- Logistic Regression model is giving a good performance on training set.
Optimal threshold using AUC-ROC curve¶
# Optimal threshold as per AUC-ROC curve
# The optimal cut off would be where tpr is high and fpr is low
fpr, tpr, thresholds = roc_curve(y_train, lg2.predict(X_train2))
optimal_idx = np.argmax(tpr - fpr)
optimal_threshold_auc_roc = thresholds[optimal_idx]
print(optimal_threshold_auc_roc)
0.34387406627651484
Checking model performance on training set
# creating confusion matrix
confusion_matrix_statsmodels(
lg2, X_train2, y_train, threshold=optimal_threshold_auc_roc
)
# checking model performance for this model
log_reg_model_train_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg2, X_train2, y_train, threshold=optimal_threshold_auc_roc
)
print("Training performance:")
log_reg_model_train_perf_threshold_auc_roc
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.787135 | 0.754399 | 0.649159 | 0.697834 |
Checking model performance on test set
logit_roc_auc_train = roc_auc_score(y_test, lg2.predict(X_test2))
fpr, tpr, thresholds = roc_curve(y_test, lg2.predict(X_test2))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_test2, y_test, threshold=optimal_threshold_auc_roc)
# checking model performance for this model
log_reg_model_test_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg2, X_test2, y_test, threshold=optimal_threshold_auc_roc
)
print("Test performance:")
log_reg_model_test_perf_threshold_auc_roc
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.781089 | 0.748819 | 0.647214 | 0.694319 |
Precision-Recall Curve¶
y_scores = lg2.predict(X_train2)
prec, rec, tre = precision_recall_curve(y_train, y_scores,)
def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
plt.plot(thresholds, precisions[:-1], "b--", label="precision")
plt.plot(thresholds, recalls[:-1], "g--", label="recall")
plt.xlabel("Threshold")
plt.legend(loc="upper left")
plt.ylim([0, 1])
plt.figure(figsize=(10, 7))
plot_prec_recall_vs_tresh(prec, rec, tre)
plt.show()
# setting the threshold
optimal_threshold_curve = 0.41
Checking model performance on training set
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_train2, y_train, threshold=optimal_threshold_curve)
log_reg_model_train_perf_threshold_curve = model_performance_classification_statsmodels(
lg2, X_train2, y_train, threshold=optimal_threshold_curve
)
print("Training performance:")
log_reg_model_train_perf_threshold_curve
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.799352 | 0.704769 | 0.687337 | 0.695944 |
Checking model performance on test set
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_test2, y_test, threshold=optimal_threshold_curve)
log_reg_model_test_perf_threshold_curve = model_performance_classification_statsmodels(
lg2, X_test2, y_test, threshold=optimal_threshold_curve
)
print("Test performance:")
log_reg_model_test_perf_threshold_curve
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.796587 | 0.700472 | 0.691064 | 0.695736 |
Model Performance Comparison and Final Model Selection¶
# training performance comparison
models_train_comp_df = pd.concat(
[
log_reg_model_train_perf.T,
log_reg_model_train_perf_threshold_auc_roc.T,
log_reg_model_train_perf_threshold_curve.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Logistic Regression-default Threshold (0.5)",
"Logistic Regression-0.34 Threshold",
"Logistic Regression-0.41 Threshold",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Logistic Regression-default Threshold (0.5) | Logistic Regression-0.34 Threshold | Logistic Regression-0.41 Threshold | |
|---|---|---|---|
| Accuracy | 0.804175 | 0.787135 | 0.799352 |
| Recall | 0.624196 | 0.754399 | 0.704769 |
| Precision | 0.734857 | 0.649159 | 0.687337 |
| F1 | 0.675021 | 0.697834 | 0.695944 |
# testing performance comparison
models_test_comp_df = pd.concat(
[
log_reg_model_test_perf.T,
log_reg_model_test_perf_threshold_auc_roc.T,
log_reg_model_test_perf_threshold_curve.T,
],
axis=1,
)
models_test_comp_df.columns = [
"Logistic Regression-default Threshold (0.5)",
"Logistic Regression-0.34 Threshold",
"Logistic Regression-0.41 Threshold",
]
print("Test set performance comparison:")
models_test_comp_df
Test set performance comparison:
| Logistic Regression-default Threshold (0.5) | Logistic Regression-0.34 Threshold | Logistic Regression-0.41 Threshold | |
|---|---|---|---|
| Accuracy | 0.805351 | 0.781089 | 0.796587 |
| Recall | 0.630453 | 0.748819 | 0.700472 |
| Precision | 0.744178 | 0.647214 | 0.691064 |
| F1 | 0.682611 | 0.694319 | 0.695736 |
Threshold = 0.5 (Default):
- Best accuracy and precision
- More conservative (fewer false positives)
- Lower recall → may miss more actual positives
Threshold = 0.34:
- Highest recall (74.9%) — captures more positive cases
- Trade-off: lowest precision and accuracy
- Useful when false negatives are more costly
Threshold = 0.41:
- Best F1 score (0.6957) → best balance between precision and recall
- A good threshold for general-purpose, balanced classification tasksatives are costly
- Use threshold = 0.34 if your goal is maximum recall (e.g., capture as many bookings as possible, even with more false positives).
- Use threshold = 0.41 for the best balance between precision and recall (highest F1 score).
- Keep threshold = 0.5 if precision or accuracy is the top priority (e.g., minimizing false positives).
0.5**.
Conclusions and Recommendations¶
Conclusions and Recommendations¶
We developed a predictive model that can assist hotel management in identifying customers likely to cancel their bookings, achieving an F1 score of 0.69 on the test set. This model can be used to optimize booking strategies and reduce revenue loss.
The logistic regression models demonstrated consistent and generalized performance across both training and test datasets.
Features such as lead time, number of special requests, market segment type, and room type were among the most influential in predicting cancellations. An increase in these variables generally increases the likelihood of a booking being canceled.
Conversely, features such as number of adults, weekend and weeknight stays, and repeated guest status were associated with lower cancellation risk.
Business Recommendations¶
Hotels should closely monitor bookings with long lead times, especially when combined with low special request counts or high cancellation-associated segments (e.g., offline market). These patterns may indicate non-committal bookings.
Loyalty programs that encourage repeated bookings should be prioritized, as repeat guests are less likely to cancel.
Adjust overbooking strategies by incorporating cancellation risk scores to avoid revenue loss and manage room inventory more effectively.
Policies should be considered to promote flexible but binding reservations, such as partial prepayments or stricter cancellation terms for high-risk bookings.
Efforts should be made to analyze price sensitivity, as average room price was also linked to cancellation behavior. Tailored offers or flexible pricing for at-risk bookings can reduce churn.
Building a Decision Tree model¶
Data Preparation for Modeling¶
# specifying the independent and dependent variables
X = data_clean.drop(["booking_status"], axis=1)
Y = data_clean["booking_status"]
# creating dummy variables
X = pd.get_dummies(X, drop_first=True)
#X = X.astype(float)
# splitting in training and test set
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=1)
print("Shape of Training set : ", X_train.shape)
print("Shape of test set : ", X_test.shape)
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Shape of Training set : (25293, 27) Shape of test set : (10840, 27) Percentage of classes in training set: booking_status 0 0.674179 1 0.325821 Name: proportion, dtype: float64 Percentage of classes in test set: booking_status 0 0.667989 1 0.332011 Name: proportion, dtype: float64
Decision Tree (default)¶
model0 = DecisionTreeClassifier(random_state=1)
model0.fit(X_train, y_train)
DecisionTreeClassifier(random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(random_state=1)
Model Evaluation¶
Model evaluation criterion
Model can make wrong predictions as:
- Predicting a machine will not fail but in reality, the machine will fail (FN)
- Predicting a machine will fail but in reality, the machine will not fail (FP)
Which case is more important?
- If we predict that a machine will not fail but in reality, the machine fails, then the company will have to bear the cost of repair/replacement and also face equipment downtime losses
- If we predict that a machine will fail but in reality, the machine does not fail, then the company will have to bear the cost of inspection
- The inspection cost is generally less compared to the repair/replacement cost
How to reduce the losses?
The company would want the recall to be maximized, greater the recall score higher are the chances of minimizing the False Negatives.
# defining a function to compute different metrics to check performance of a classification model built using sklearn
def model_performance_classification_sklearn(model, predictors, target):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
"""
# predicting using the independent variables
pred = model.predict(predictors)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)
return df_perf
def confusion_matrix_sklearn(model, predictors, target):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
"""
y_pred = model.predict(predictors)
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
confusion_matrix_sklearn(model0, X_train, y_train)
decision_tree_perf_train = model_performance_classification_sklearn(
model0, X_train, y_train
)
decision_tree_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.994109 | 0.985924 | 0.995955 | 0.990914 |
confusion_matrix_sklearn(model0, X_test, y_test)
decision_tree_perf_test = model_performance_classification_sklearn(
model0, X_test, y_test
)
decision_tree_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.869926 | 0.799667 | 0.80684 | 0.803238 |
Visualizing the Decision Tree¶
feature_names = list(X.columns)
print(feature_names)
['no_of_adults', 'no_of_children', 'no_of_weekend_nights', 'no_of_week_nights', 'required_car_parking_space', 'lead_time', 'arrival_year', 'arrival_month', 'arrival_date', 'repeated_guest', 'no_of_previous_cancellations', 'no_of_previous_bookings_not_canceled', 'avg_price_per_room', 'no_of_special_requests', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Meal Plan 3', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 3', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Complementary', 'market_segment_type_Corporate', 'market_segment_type_Offline', 'market_segment_type_Online']
plt.figure(figsize=(20, 10))
out = tree.plot_tree(
model0,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(model0, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- avg_price_per_room <= 201.50 | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- avg_price_per_room <= 88.50 | | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- weights: [74.00, 0.00] class: 0 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | | |--- weights: [55.00, 0.00] class: 0 | | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | | |--- lead_time <= 32.50 | | | | | | | | | | |--- weights: [142.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 32.50 | | | | | | | | | | |--- no_of_previous_bookings_not_canceled <= 10.00 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_previous_bookings_not_canceled > 10.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- avg_price_per_room > 88.50 | | | | | | | | |--- lead_time <= 9.50 | | | | | | | | | |--- avg_price_per_room <= 162.53 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- avg_price_per_room > 162.53 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 9.50 | | | | | | | | | |--- arrival_date <= 23.00 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_date > 23.00 | | | | | | | | | | |--- no_of_previous_bookings_not_canceled <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_previous_bookings_not_canceled > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- weights: [1643.00, 0.00] class: 0 | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | |--- lead_time <= 68.50 | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | |--- no_of_weekend_nights <= 4.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- lead_time <= 65.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- lead_time > 65.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- no_of_weekend_nights > 4.50 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | |--- arrival_date > 27.50 | | | | | | | | |--- lead_time <= 1.50 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- avg_price_per_room <= 31.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 31.50 | | | | | | | | | | | |--- weights: [0.00, 34.00] class: 1 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- avg_price_per_room <= 72.00 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 72.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- lead_time > 1.50 | | | | | | | | | |--- avg_price_per_room <= 40.83 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 40.83 | | | | | | | | | | |--- avg_price_per_room <= 120.97 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 120.97 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | |--- lead_time > 68.50 | | | | | | | |--- avg_price_per_room <= 99.98 | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | | |--- arrival_date <= 16.00 | | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 16.00 | | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- weights: [18.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | |--- avg_price_per_room <= 92.27 | | | | | | | | | | |--- lead_time <= 70.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 70.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- avg_price_per_room > 92.27 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- avg_price_per_room > 99.98 | | | | | | | | |--- lead_time <= 81.00 | | | | | | | | | |--- avg_price_per_room <= 123.25 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 47.00] class: 1 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- avg_price_per_room > 123.25 | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 81.00 | | | | | | | | | |--- lead_time <= 88.50 | | | | | | | | | | |--- weights: [18.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 88.50 | | | | | | | | | | |--- lead_time <= 89.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time > 89.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 201.50 | | | | | |--- arrival_month <= 10.50 | | | | | | |--- weights: [0.00, 19.00] class: 1 | | | | | |--- arrival_month > 10.50 | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 117.50 | | | | | |--- avg_price_per_room <= 93.58 | | | | | | |--- arrival_date <= 6.50 | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | |--- avg_price_per_room <= 73.75 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 73.75 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- weights: [0.00, 65.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- lead_time <= 93.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time > 93.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- weights: [32.00, 0.00] class: 0 | | | | | | |--- arrival_date > 6.50 | | | | | | | |--- avg_price_per_room <= 65.48 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- weights: [27.00, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 65.48 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- avg_price_per_room <= 73.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 73.50 | | | | | | | | | | |--- arrival_date <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 10.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | | |--- avg_price_per_room <= 69.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 69.00 | | | | | | | | | | | |--- weights: [36.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | |--- avg_price_per_room > 93.58 | | | | | | |--- arrival_month <= 5.50 | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- lead_time <= 114.50 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 59.00] class: 1 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 114.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | |--- arrival_month > 5.50 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | | |--- lead_time <= 110.50 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- weights: [8.00, 4.00] class: 0 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 110.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- arrival_date > 14.00 | | | | | | | | | |--- avg_price_per_room <= 138.25 | | | | | | | | | | |--- lead_time <= 101.00 | | | | | | | | | | | |--- weights: [0.00, 24.00] class: 1 | | | | | | | | | | |--- lead_time > 101.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 138.25 | | | | | | | | | | |--- avg_price_per_room <= 177.83 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 177.83 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- avg_price_per_room <= 101.88 | | | | | | | | | |--- lead_time <= 102.50 | | | | | | | | | | |--- weights: [0.00, 37.00] class: 1 | | | | | | | | | |--- lead_time > 102.50 | | | | | | | | | | |--- avg_price_per_room <= 98.96 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 98.96 | | | | | | | | | | | |--- weights: [16.00, 1.00] class: 0 | | | | | | | | |--- avg_price_per_room > 101.88 | | | | | | | | | |--- arrival_date <= 15.00 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- arrival_date > 15.00 | | | | | | | | | | |--- lead_time <= 111.00 | | | | | | | | | | | |--- weights: [48.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 111.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- lead_time > 117.50 | | | | | |--- no_of_week_nights <= 0.50 | | | | | | |--- avg_price_per_room <= 92.50 | | | | | | | |--- avg_price_per_room <= 85.38 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 85.38 | | | | | | | | |--- weights: [7.00, 3.00] class: 0 | | | | | | |--- avg_price_per_room > 92.50 | | | | | | | |--- arrival_date <= 11.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 11.00 | | | | | | | | |--- weights: [0.00, 21.00] class: 1 | | | | | |--- no_of_week_nights > 0.50 | | | | | | |--- no_of_adults <= 1.50 | | | | | | | |--- weights: [133.00, 0.00] class: 0 | | | | | | |--- no_of_adults > 1.50 | | | | | | | |--- avg_price_per_room <= 89.88 | | | | | | | | |--- no_of_week_nights <= 8.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [56.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 8.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- avg_price_per_room > 89.88 | | | | | | | | |--- avg_price_per_room <= 96.45 | | | | | | | | | |--- lead_time <= 125.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 125.50 | | | | | | | | | | |--- avg_price_per_room <= 92.72 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 92.72 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- avg_price_per_room > 96.45 | | | | | | | | | |--- avg_price_per_room <= 115.81 | | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 115.81 | | | | | | | | | | |--- avg_price_per_room <= 131.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 131.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 13.50 | | | | |--- lead_time <= 3.50 | | | | | |--- avg_price_per_room <= 202.67 | | | | | | |--- arrival_month <= 5.50 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- avg_price_per_room <= 100.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [54.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 77.50 | | | | | | | | | | | |--- weights: [24.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 77.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | |--- avg_price_per_room > 100.50 | | | | | | | | | |--- avg_price_per_room <= 106.50 | | | | | | | | | | |--- avg_price_per_room <= 104.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 104.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 106.50 | | | | | | | | | | |--- avg_price_per_room <= 130.50 | | | | | | | | | | | |--- weights: [34.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 130.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- arrival_date <= 26.00 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- no_of_week_nights <= 5.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_week_nights > 5.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_date > 26.00 | | | | | | | | | |--- avg_price_per_room <= 75.00 | | | | | | | | | | |--- weights: [0.00, 13.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 75.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- arrival_month > 5.50 | | | | | | | |--- no_of_week_nights <= 8.50 | | | | | | | | |--- avg_price_per_room <= 137.50 | | | | | | | | | |--- avg_price_per_room <= 118.90 | | | | | | | | | | |--- avg_price_per_room <= 118.04 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- avg_price_per_room > 118.04 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 118.90 | | | | | | | | | | |--- weights: [67.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 137.50 | | | | | | | | | |--- avg_price_per_room <= 139.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 139.50 | | | | | | | | | | |--- lead_time <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- no_of_week_nights > 8.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- avg_price_per_room > 202.67 | | | | | | |--- weights: [0.00, 11.00] class: 1 | | | | |--- lead_time > 3.50 | | | | | |--- avg_price_per_room <= 99.17 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- weights: [42.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- lead_time <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 4.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_date > 1.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- avg_price_per_room <= 85.28 | | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 85.28 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [91.00, 0.00] class: 0 | | | | | |--- avg_price_per_room > 99.17 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | | |--- lead_time <= 4.50 | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | | | |--- lead_time > 4.50 | | | | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 1.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | |--- arrival_date > 21.50 | | | | | | | | | |--- lead_time <= 9.50 | | | | | | | | | | |--- lead_time <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 7.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 9.50 | | | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | |--- no_of_week_nights <= 8.00 | | | | | | | | | | |--- lead_time <= 8.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 8.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_week_nights > 8.00 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | |--- avg_price_per_room <= 121.40 | | | | | | | | | | |--- avg_price_per_room <= 120.40 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 120.40 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 121.40 | | | | | | | | | | |--- weights: [0.00, 37.00] class: 1 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- weights: [29.00, 0.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- lead_time <= 9.50 | | | | | | | | | | |--- avg_price_per_room <= 162.17 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 162.17 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 9.50 | | | | | | | | | | |--- arrival_date <= 26.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- arrival_date > 26.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [20.00, 0.00] class: 0 | | | |--- lead_time > 13.50 | | | | |--- avg_price_per_room <= 105.27 | | | | | |--- lead_time <= 25.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- lead_time <= 24.50 | | | | | | | | | |--- weights: [27.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 24.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | |--- arrival_date <= 26.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 26.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | |--- arrival_date <= 18.50 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_date > 18.50 | | | | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 30.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- weights: [27.00, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- arrival_date <= 13.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_date > 13.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- arrival_date <= 6.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- arrival_date > 6.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- lead_time <= 19.00 | | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | | | | |--- lead_time > 19.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- avg_price_per_room <= 103.83 | | | | | | | | | |--- weights: [57.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 103.83 | | | | | | | | | |--- lead_time <= 24.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 24.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- lead_time > 25.50 | | | | | | |--- avg_price_per_room <= 59.43 | | | | | | | |--- avg_price_per_room <= 29.10 | | | | | | | | |--- weights: [27.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 29.10 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- avg_price_per_room <= 55.36 | | | | | | | | | | |--- arrival_date <= 25.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 25.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 55.36 | | | | | | | | | | |--- no_of_week_nights <= 6.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_week_nights > 6.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | |--- lead_time <= 76.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 76.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | |--- lead_time <= 55.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 55.50 | | | | | | | | | | | |--- weights: [16.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 59.43 | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 25 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | | |--- weights: [0.00, 31.00] class: 1 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- lead_time <= 48.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 48.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 61.87 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 61.87 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | |--- weights: [21.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 105.27 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- arrival_year <= 2017.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- weights: [20.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 30.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- arrival_date <= 15.00 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_date > 15.00 | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | |--- arrival_year > 2017.50 | | | | | | | |--- lead_time <= 47.50 | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | |--- avg_price_per_room <= 195.43 | | | | | | | | | | |--- lead_time <= 38.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- lead_time > 38.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | |--- avg_price_per_room > 195.43 | | | | | | | | | | |--- weights: [0.00, 49.00] class: 1 | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | |--- lead_time <= 22.00 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 22.00 | | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- lead_time > 47.50 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- avg_price_per_room <= 145.86 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | |--- avg_price_per_room > 145.86 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | |--- arrival_date <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 4.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | |--- avg_price_per_room <= 122.97 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 122.97 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | |--- weights: [40.00, 0.00] class: 0 | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- lead_time <= 150.50 | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | |--- lead_time <= 91.50 | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | |--- weights: [881.00, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- avg_price_per_room <= 150.97 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 150.97 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- lead_time > 91.50 | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | |--- arrival_date <= 9.00 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 9.00 | | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | |--- lead_time <= 92.50 | | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 92.50 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | | |--- weights: [60.00, 0.00] class: 0 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | |--- weights: [17.00, 0.00] class: 0 | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | |--- lead_time <= 42.50 | | | | | | | | | |--- weights: [2.00, 1.00] class: 0 | | | | | | | | |--- lead_time > 42.50 | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | |--- lead_time > 150.50 | | | | | |--- weights: [0.00, 2.00] class: 1 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 9.50 | | | | | |--- lead_time <= 4.50 | | | | | | |--- avg_price_per_room <= 157.93 | | | | | | | |--- no_of_week_nights <= 10.50 | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [204.00, 0.00] class: 0 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- no_of_week_nights > 10.50 | | | | | | | | |--- avg_price_per_room <= 76.40 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 76.40 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- avg_price_per_room > 157.93 | | | | | | | |--- avg_price_per_room <= 158.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- avg_price_per_room > 158.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 241.00 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 241.00 | | | | | | | | | | |--- avg_price_per_room <= 243.40 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 243.40 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- lead_time > 4.50 | | | | | | |--- avg_price_per_room <= 123.50 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- arrival_date <= 12.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 12.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- avg_price_per_room <= 66.40 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 66.40 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- arrival_date <= 13.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_date > 13.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | |--- weights: [96.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 123.50 | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | |--- avg_price_per_room <= 146.42 | | | | | | | | | | |--- arrival_month <= 3.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 3.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 146.42 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | |--- weights: [37.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 5.50 | | | | | | | | |--- avg_price_per_room <= 129.50 | | | | | | | | | |--- avg_price_per_room <= 128.75 | | | | | | | | | | |--- arrival_date <= 8.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- arrival_date > 8.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- avg_price_per_room > 128.75 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- avg_price_per_room > 129.50 | | | | | | | | | |--- avg_price_per_room <= 139.57 | | | | | | | | | | |--- weights: [25.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 139.57 | | | | | | | | | | |--- avg_price_per_room <= 167.75 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 167.75 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | |--- lead_time > 9.50 | | | | | |--- no_of_week_nights <= 6.50 | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | |--- avg_price_per_room <= 118.30 | | | | | | | | |--- lead_time <= 67.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | | |--- weights: [86.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | | |--- truncated branch of depth 24 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [177.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 67.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | |--- avg_price_per_room > 118.30 | | | | | | | | |--- lead_time <= 150.50 | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 19 | | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- lead_time > 150.50 | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | |--- lead_time <= 150.00 | | | | | | | | |--- weights: [166.00, 0.00] class: 0 | | | | | | | |--- lead_time > 150.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- no_of_week_nights > 6.50 | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | |--- lead_time <= 107.50 | | | | | | | | |--- avg_price_per_room <= 68.03 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 68.03 | | | | | | | | | |--- lead_time <= 20.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time > 20.50 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | |--- lead_time > 107.50 | | | | | | | | |--- avg_price_per_room <= 105.40 | | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 105.40 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | |--- arrival_date <= 20.50 | | | | | | | | |--- lead_time <= 127.00 | | | | | | | | | |--- lead_time <= 14.00 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 14.00 | | | | | | | | | | |--- weights: [0.00, 19.00] class: 1 | | | | | | | | |--- lead_time > 127.00 | | | | | | | | | |--- no_of_week_nights <= 9.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 9.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- arrival_date > 20.50 | | | | | | | | |--- no_of_week_nights <= 8.50 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 8.50 | | | | | | | | | |--- avg_price_per_room <= 75.15 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 75.15 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 89.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [2120.00, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | |--- lead_time <= 6.50 | | | | | | | | |--- weights: [40.00, 0.00] class: 0 | | | | | | | |--- lead_time > 6.50 | | | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | | | |--- weights: [16.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [17.00, 0.00] class: 0 | | | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | | | |--- weights: [36.00, 0.00] class: 0 | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- avg_price_per_room <= 122.88 | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 122.88 | | | | | | | | | |--- arrival_date <= 13.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 13.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- lead_time <= 80.00 | | | | | | | | | |--- weights: [74.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 80.00 | | | | | | | | | |--- lead_time <= 81.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time > 81.50 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | |--- lead_time > 89.50 | | | | |--- avg_price_per_room <= 202.14 | | | | | |--- lead_time <= 150.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | | |--- arrival_date <= 21.00 | | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | | | |--- arrival_date > 21.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 14.00 | | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | |--- no_of_week_nights <= 7.00 | | | | | | | | | | |--- arrival_date <= 25.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_date > 25.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- no_of_week_nights > 7.00 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | | |--- weights: [46.00, 0.00] class: 0 | | | | | |--- lead_time > 150.50 | | | | | | |--- avg_price_per_room <= 98.17 | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 98.17 | | | | | | | |--- arrival_month <= 11.00 | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | |--- arrival_month > 11.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 202.14 | | | | | |--- weights: [0.00, 9.00] class: 1 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests <= 0.50 | | | |--- no_of_adults <= 1.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 163.50 | | | | | | |--- arrival_date <= 7.00 | | | | | | | |--- weights: [0.00, 14.00] class: 1 | | | | | | |--- arrival_date > 7.00 | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | |--- lead_time > 163.50 | | | | | | |--- lead_time <= 341.00 | | | | | | | |--- lead_time <= 173.00 | | | | | | | | |--- avg_price_per_room <= 97.50 | | | | | | | | | |--- avg_price_per_room <= 70.85 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 70.85 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1 | | | | | | | | |--- avg_price_per_room > 97.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [59.00, 4.00] class: 0 | | | | | | | |--- lead_time > 173.00 | | | | | | | | |--- avg_price_per_room <= 98.00 | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | |--- lead_time <= 208.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- lead_time > 208.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | |--- lead_time <= 312.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 312.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- avg_price_per_room > 98.00 | | | | | | | | | |--- arrival_date <= 13.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 13.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | |--- lead_time > 341.00 | | | | | | | |--- avg_price_per_room <= 75.00 | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 75.00 | | | | | | | | |--- avg_price_per_room <= 88.00 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | |--- avg_price_per_room > 88.00 | | | | | | | | | |--- no_of_week_nights <= 1.00 | | | | | | | | | | |--- weights: [2.00, 3.00] class: 1 | | | | | | | | | |--- no_of_week_nights > 1.00 | | | | | | | | | | |--- lead_time <= 363.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 363.00 | | | | | | | | | | | |--- weights: [4.00, 1.00] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- avg_price_per_room <= 2.50 | | | | | | |--- lead_time <= 285.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- lead_time > 285.50 | | | | | | | |--- no_of_weekend_nights <= 1.00 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- no_of_weekend_nights > 1.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- avg_price_per_room > 2.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- weights: [0.00, 51.00] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | |--- avg_price_per_room <= 58.59 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 58.59 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | |--- no_of_adults > 1.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- avg_price_per_room <= 84.62 | | | | | | |--- lead_time <= 244.00 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- lead_time <= 166.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 166.50 | | | | | | | | | | |--- avg_price_per_room <= 69.34 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 69.34 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- weights: [19.00, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- avg_price_per_room <= 66.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- no_of_week_nights <= 4.00 | | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | | | | |--- no_of_week_nights > 4.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- avg_price_per_room <= 27.77 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 27.77 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- avg_price_per_room > 66.50 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- arrival_date <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 4.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- lead_time > 244.00 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- weights: [39.00, 0.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- avg_price_per_room <= 76.00 | | | | | | | | | | |--- avg_price_per_room <= 45.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 45.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 76.00 | | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [33.00, 0.00] class: 0 | | | | | |--- avg_price_per_room > 84.62 | | | | | | |--- no_of_adults <= 2.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- lead_time <= 316.00 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 316.00 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- weights: [1.00, 4.00] class: 1 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | |--- no_of_adults > 2.50 | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- no_of_adults <= 2.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- weights: [0.00, 460.00] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- lead_time <= 214.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 214.50 | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 76.87 | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 76.87 | | | | | | | | | | |--- arrival_date <= 20.00 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- arrival_date > 20.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | |--- weights: [0.00, 54.00] class: 1 | | | | | |--- no_of_adults > 2.50 | | | | | | |--- weights: [1.00, 0.00] class: 0 | | |--- no_of_special_requests > 0.50 | | | |--- no_of_weekend_nights <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- lead_time <= 158.50 | | | | | | |--- arrival_month <= 9.00 | | | | | | | |--- avg_price_per_room <= 98.81 | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 98.81 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- arrival_month > 9.00 | | | | | | | |--- arrival_date <= 12.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 12.00 | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | |--- lead_time > 158.50 | | | | | | |--- arrival_date <= 2.00 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- arrival_date > 2.00 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- weights: [40.00, 0.00] class: 0 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- lead_time <= 175.00 | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 175.00 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | |--- lead_time > 180.50 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | |--- arrival_date <= 10.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- arrival_date > 10.50 | | | | | | | | |--- lead_time <= 302.50 | | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 302.50 | | | | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | | | | |--- weights: [2.00, 1.00] class: 0 | | | | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | |--- weights: [0.00, 116.00] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- lead_time <= 300.00 | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- lead_time > 300.00 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [16.00, 0.00] class: 0 | | | |--- no_of_weekend_nights > 0.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- arrival_date <= 30.50 | | | | | | |--- lead_time <= 348.50 | | | | | | | |--- weights: [151.00, 0.00] class: 0 | | | | | | |--- lead_time > 348.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- no_of_special_requests <= 2.00 | | | | | | | | | |--- avg_price_per_room <= 58.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 58.50 | | | | | | | | | | |--- weights: [5.00, 1.00] class: 0 | | | | | | | | |--- no_of_special_requests > 2.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- arrival_date > 30.50 | | | | | | |--- avg_price_per_room <= 74.17 | | | | | | | |--- weights: [2.00, 1.00] class: 0 | | | | | | |--- avg_price_per_room > 74.17 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- no_of_week_nights <= 9.50 | | | | | | | |--- avg_price_per_room <= 76.48 | | | | | | | | |--- arrival_date <= 27.00 | | | | | | | | | |--- lead_time <= 289.00 | | | | | | | | | | |--- weights: [51.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 289.00 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_date > 27.00 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- lead_time <= 236.00 | | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 236.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- avg_price_per_room > 76.48 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- no_of_week_nights <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | | |--- no_of_week_nights > 6.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- avg_price_per_room <= 80.42 | | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 80.42 | | | | | | | | | | |--- avg_price_per_room <= 93.58 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 93.58 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | |--- no_of_week_nights > 9.50 | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | |--- arrival_month > 11.50 | | | | | | |--- arrival_date <= 29.50 | | | | | | | |--- avg_price_per_room <= 55.92 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 55.92 | | | | | | | | |--- avg_price_per_room <= 80.19 | | | | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | | | | |--- lead_time <= 195.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 195.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 80.19 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- arrival_date <= 28.50 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 28.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | |--- arrival_date > 29.50 | | | | | | | |--- weights: [0.00, 5.00] class: 1 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests <= 2.50 | | | | |--- weights: [0.00, 2091.00] class: 1 | | | |--- no_of_special_requests > 2.50 | | | | |--- weights: [32.00, 0.00] class: 0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- weights: [59.00, 0.00] class: 0 | | | |--- no_of_special_requests > 0.50 | | | | |--- arrival_date <= 24.50 | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | |--- arrival_date > 24.50 | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | |--- avg_price_per_room <= 119.00 | | | | | | | |--- avg_price_per_room <= 103.92 | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 103.92 | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | |--- avg_price_per_room > 119.00 | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | |--- weights: [0.00, 7.00] class: 1
# importance of features in the tree building
importances = model0.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
Decision Tree (Pre-pruning)¶
Using GridSearch for Hyperparameter tuning of our tree model
- Hyperparameter tuning is also tricky in the sense that there is no direct way to calculate how a change in the hyperparameter value will reduce the loss of your model, so we usually resort to experimentation. i.e we'll use Grid search
- Grid search is a tuning technique that attempts to compute the optimum values of hyperparameters.
- It is an exhaustive search that is performed on a the specific parameter values of a model.
- The parameters of the estimator/model used to apply these methods are optimized by cross-validated grid-search over a parameter grid.
# Choose the type of classifier.
estimator = DecisionTreeClassifier(random_state=1, class_weight="balanced")
# Grid of parameters to choose from
parameters = {
"max_depth": np.arange(2, 7, 2),
"max_leaf_nodes": [10, 20, 50, 100],
"min_samples_split": [10, 30, 50, 70],
"criterion": ["entropy", "gini"],
"splitter": ["best", "random"],
}
# Type of scoring used to compare parameter combinations
acc_scorer = make_scorer(f1_score)
# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring=acc_scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
estimator = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator.fit(X_train, y_train)
DecisionTreeClassifier(class_weight='balanced', max_depth=6, max_leaf_nodes=50,
min_samples_split=10, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(class_weight='balanced', max_depth=6, max_leaf_nodes=50,
min_samples_split=10, random_state=1)confusion_matrix_sklearn(estimator, X_train, y_train)
decision_tree_tune_perf_train = model_performance_classification_sklearn(
estimator, X_train, y_train
)
decision_tree_tune_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.838137 | 0.781337 | 0.737487 | 0.758779 |
confusion_matrix_sklearn(estimator, X_test, y_test)
decision_tree_tune_perf_test = model_performance_classification_sklearn(
estimator, X_test, y_test
)
decision_tree_tune_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.831827 | 0.779105 | 0.731733 | 0.754676 |
Visualizing the Decision Tree¶
feature_names = list(X_train.columns)
importances = estimator.feature_importances_
indices = np.argsort(importances)
- The model is giving a generalized result now since the recall scores on both the train and test data are coming to be around 0.86 which shows that the model is able to generalize well on unseen data.
plt.figure(figsize=(20, 10))
out = tree.plot_tree(
estimator,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(estimator, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_weekend_nights <= 0.50 | | | | | |--- avg_price_per_room <= 201.50 | | | | | | |--- weights: [1772.53, 136.58] class: 0 | | | | | |--- avg_price_per_room > 201.50 | | | | | | |--- weights: [0.74, 29.16] class: 1 | | | | |--- no_of_weekend_nights > 0.50 | | | | | |--- lead_time <= 65.50 | | | | | | |--- weights: [927.05, 211.77] class: 0 | | | | | |--- lead_time > 65.50 | | | | | | |--- weights: [143.88, 164.20] class: 1 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 117.50 | | | | | |--- arrival_month <= 10.50 | | | | | | |--- weights: [247.71, 511.02] class: 1 | | | | | |--- arrival_month > 10.50 | | | | | | |--- weights: [40.79, 3.07] class: 0 | | | | |--- lead_time > 117.50 | | | | | |--- avg_price_per_room <= 89.88 | | | | | | |--- weights: [169.09, 24.55] class: 0 | | | | | |--- avg_price_per_room > 89.88 | | | | | | |--- weights: [123.11, 121.23] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 9.50 | | | | |--- lead_time <= 3.50 | | | | | |--- avg_price_per_room <= 202.67 | | | | | | |--- weights: [468.72, 124.30] class: 0 | | | | | |--- avg_price_per_room > 202.67 | | | | | | |--- weights: [0.00, 16.88] class: 1 | | | | |--- lead_time > 3.50 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- weights: [114.21, 199.50] class: 1 | | | | | |--- arrival_month > 8.50 | | | | | | |--- weights: [117.92, 32.23] class: 0 | | | |--- lead_time > 9.50 | | | | |--- avg_price_per_room <= 99.82 | | | | | |--- lead_time <= 26.50 | | | | | | |--- weights: [186.89, 156.53] class: 0 | | | | | |--- lead_time > 26.50 | | | | | | |--- weights: [423.48, 1091.09] class: 1 | | | | |--- avg_price_per_room > 99.82 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- weights: [480.58, 2567.36] class: 1 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- weights: [36.34, 3.07] class: 0 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | |--- no_of_week_nights <= 7.50 | | | | | | |--- weights: [780.95, 19.95] class: 0 | | | | | |--- no_of_week_nights > 7.50 | | | | | | |--- weights: [0.00, 3.07] class: 1 | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | |--- lead_time <= 60.50 | | | | | | |--- weights: [13.35, 1.53] class: 0 | | | | | |--- lead_time > 60.50 | | | | | | |--- weights: [0.74, 7.67] class: 1 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 9.50 | | | | | |--- lead_time <= 4.50 | | | | | | |--- weights: [516.18, 36.83] class: 0 | | | | | |--- lead_time > 4.50 | | | | | | |--- weights: [295.92, 78.26] class: 0 | | | | |--- lead_time > 9.50 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- weights: [2467.45, 1413.35] class: 0 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- weights: [123.11, 3.07] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 89.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [1572.28, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- weights: [223.23, 59.85] class: 0 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- weights: [0.74, 7.67] class: 1 | | | |--- lead_time > 89.50 | | | | |--- avg_price_per_room <= 202.14 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- weights: [320.39, 153.46] class: 0 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [59.33, 0.00] class: 0 | | | | |--- avg_price_per_room > 202.14 | | | | | |--- weights: [0.00, 13.81] class: 1 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests <= 0.50 | | | |--- no_of_adults <= 1.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 163.50 | | | | | | |--- weights: [5.19, 21.48] class: 1 | | | | | |--- lead_time > 163.50 | | | | | | |--- weights: [249.19, 59.85] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- avg_price_per_room <= 2.50 | | | | | | |--- weights: [8.16, 6.14] class: 0 | | | | | |--- avg_price_per_room > 2.50 | | | | | | |--- weights: [1.48, 85.94] class: 1 | | | |--- no_of_adults > 1.50 | | | | |--- avg_price_per_room <= 82.47 | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | |--- weights: [3.71, 319.19] class: 1 | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | |--- weights: [206.92, 372.90] class: 1 | | | | |--- avg_price_per_room > 82.47 | | | | | |--- no_of_adults <= 2.50 | | | | | | |--- weights: [20.02, 995.94] class: 1 | | | | | |--- no_of_adults > 2.50 | | | | | | |--- weights: [7.42, 0.00] class: 0 | | |--- no_of_special_requests > 0.50 | | | |--- no_of_weekend_nights <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- lead_time <= 158.50 | | | | | | |--- weights: [6.67, 9.21] class: 1 | | | | | |--- lead_time > 158.50 | | | | | | |--- weights: [41.53, 4.60] class: 0 | | | | |--- lead_time > 180.50 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- weights: [22.99, 201.03] class: 1 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [11.87, 0.00] class: 0 | | | |--- no_of_weekend_nights > 0.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- weights: [120.15, 3.07] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- weights: [222.49, 112.02] class: 0 | | | | | |--- arrival_month > 11.50 | | | | | | |--- weights: [16.32, 33.76] class: 1 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests <= 2.50 | | | | |--- weights: [0.00, 3208.81] class: 1 | | | |--- no_of_special_requests > 2.50 | | | | |--- weights: [23.73, 0.00] class: 0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- weights: [43.76, 0.00] class: 0 | | | |--- no_of_special_requests > 0.50 | | | | |--- arrival_date <= 24.50 | | | | | |--- weights: [3.71, 0.00] class: 0 | | | | |--- arrival_date > 24.50 | | | | | |--- weights: [4.45, 21.48] class: 1
importances = estimator.feature_importances_
importances
array([0.02792099, 0. , 0.01865427, 0.00629494, 0.01153544,
0.47269058, 0. , 0.02294526, 0.00069693, 0. ,
0. , 0. , 0.07279409, 0.17097881, 0. ,
0. , 0.00095233, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.00774418, 0.18679218])
# importance of features in the tree building
importances = estimator.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
Cost Complexity Pruning¶
The DecisionTreeClassifier provides parameters such as
min_samples_leaf and max_depth to prevent a tree from overfiting. Cost
complexity pruning provides another option to control the size of a tree. In
DecisionTreeClassifier, this pruning technique is parameterized by the
cost complexity parameter, ccp_alpha. Greater values of ccp_alpha
increase the number of nodes pruned. Here we only show the effect of
ccp_alpha on regularizing the trees and how to choose a ccp_alpha
based on validation scores.
clf = DecisionTreeClassifier(random_state=1, class_weight="balanced")
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.000000e+00 | 0.008576 |
| 1 | 2.604323e-20 | 0.008576 |
| 2 | 2.604323e-20 | 0.008576 |
| 3 | 2.604323e-20 | 0.008576 |
| 4 | 2.604323e-20 | 0.008576 |
| ... | ... | ... |
| 1720 | 9.677002e-03 | 0.327326 |
| 1721 | 9.930481e-03 | 0.337256 |
| 1722 | 1.258597e-02 | 0.349842 |
| 1723 | 3.471084e-02 | 0.419264 |
| 1724 | 8.073633e-02 | 0.500000 |
1725 rows × 2 columns
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
Next, we train a decision tree using the effective alphas. The last value
in ccp_alphas is the alpha value that prunes the whole tree,
leaving the tree, clfs[-1], with one node.
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(
random_state=1, ccp_alpha=ccp_alpha, class_weight="balanced"
)
clf.fit(X_train, y_train)
clfs.append(clf)
print(
"Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]
)
)
Number of nodes in the last tree is: 1 with ccp_alpha: 0.08073633158419125
For the remainder, we remove the last element in
clfs and ccp_alphas, because it is the trivial tree with only one
node. Here we show that the number of nodes and tree depth decreases as alpha
increases.
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
F1 Score vs alpha for training and testing sets¶
f1_train = []
for clf in clfs:
pred_train = clf.predict(X_train)
values_train = f1_score(y_train, pred_train)
f1_train.append(values_train)
f1_test = []
for clf in clfs:
pred_test = clf.predict(X_test)
values_test = f1_score(y_test, pred_test)
f1_test.append(values_test)
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("F1 Score")
ax.set_title("F1 Score vs alpha for training and testing sets")
ax.plot(ccp_alphas, f1_train, marker="o", label="train", drawstyle="steps-post")
ax.plot(ccp_alphas, f1_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
index_best_model = np.argmax(f1_test)
best_model = clfs[index_best_model]
print("Best pruned model:\n", best_model)
Best pruned model:
DecisionTreeClassifier(ccp_alpha=6.835051072498095e-05, class_weight='balanced',
random_state=1)
Checking model performance on training set¶
confusion_matrix_sklearn(best_model, X_train, y_train)
decision_tree_postpruned_perf_train = model_performance_classification_sklearn(
best_model, X_train, y_train
)
decision_tree_postpruned_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.941525 | 0.953404 | 0.877681 | 0.913977 |
Checking model performance on test set¶
confusion_matrix_sklearn(best_model, X_train, y_train)
decision_tree_postpruned_perf_test = model_performance_classification_sklearn(
best_model, X_test, y_test
)
decision_tree_postpruned_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.871771 | 0.846624 | 0.784299 | 0.81427 |
- With post-pruning we are getting good and generalized model performance on both training and test set.
- The f1 score has improved further.
Visualizing the Decision Tree¶
plt.figure(figsize=(10, 10))
out = tree.plot_tree(
best_model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=True,
class_names=True,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(best_model, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_weekend_nights <= 0.50 | | | | | |--- avg_price_per_room <= 201.50 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- avg_price_per_room <= 87.25 | | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- weights: [54.88, 0.00] class: 0 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | | |--- weights: [40.79, 0.00] class: 0 | | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | | |--- weights: [106.80, 1.53] class: 0 | | | | | | | |--- avg_price_per_room > 87.25 | | | | | | | | |--- lead_time <= 9.50 | | | | | | | | | |--- avg_price_per_room <= 162.53 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- weights: [80.10, 6.14] class: 0 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 162.53 | | | | | | | | | | |--- weights: [2.22, 4.60] class: 1 | | | | | | | | |--- lead_time > 9.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 12.28] class: 1 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- weights: [1218.52, 0.00] class: 0 | | | | | |--- avg_price_per_room > 201.50 | | | | | | |--- weights: [0.74, 29.16] class: 1 | | | | |--- no_of_weekend_nights > 0.50 | | | | | |--- lead_time <= 65.50 | | | | | | |--- arrival_month <= 9.50 | | | | | | | |--- avg_price_per_room <= 62.40 | | | | | | | | |--- arrival_date <= 20.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- weights: [31.89, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- weights: [0.74, 1.53] class: 1 | | | | | | | | |--- arrival_date > 20.50 | | | | | | | | | |--- avg_price_per_room <= 59.75 | | | | | | | | | | |--- arrival_date <= 24.00 | | | | | | | | | | | |--- weights: [0.74, 12.28] class: 1 | | | | | | | | | | |--- arrival_date > 24.00 | | | | | | | | | | | |--- weights: [17.80, 1.53] class: 0 | | | | | | | | | |--- avg_price_per_room > 59.75 | | | | | | | | | | |--- lead_time <= 39.50 | | | | | | | | | | | |--- weights: [0.00, 59.85] class: 1 | | | | | | | | | | |--- lead_time > 39.50 | | | | | | | | | | | |--- weights: [1.48, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 62.40 | | | | | | | | |--- no_of_week_nights <= 9.50 | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | | |--- weights: [19.28, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 9.50 | | | | | | | | | |--- weights: [0.00, 6.14] class: 1 | | | | | | |--- arrival_month > 9.50 | | | | | | | |--- no_of_weekend_nights <= 5.00 | | | | | | | | |--- lead_time <= 40.50 | | | | | | | | | |--- lead_time <= 39.50 | | | | | | | | | | |--- avg_price_per_room <= 66.62 | | | | | | | | | | | |--- weights: [97.90, 1.53] class: 0 | | | | | | | | | | |--- avg_price_per_room > 66.62 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- lead_time > 39.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 3.07] class: 1 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [11.87, 1.53] class: 0 | | | | | | | | |--- lead_time > 40.50 | | | | | | | | | |--- weights: [102.35, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 5.00 | | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | | |--- lead_time > 65.50 | | | | | | |--- avg_price_per_room <= 99.98 | | | | | | | |--- lead_time <= 81.50 | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | |--- avg_price_per_room <= 90.00 | | | | | | | | | | |--- weights: [20.77, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 90.00 | | | | | | | | | | |--- weights: [0.74, 1.53] class: 1 | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- weights: [19.28, 1.53] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- lead_time > 81.50 | | | | | | | | |--- weights: [43.76, 4.60] class: 0 | | | | | | |--- avg_price_per_room > 99.98 | | | | | | | |--- lead_time <= 81.00 | | | | | | | | |--- avg_price_per_room <= 123.25 | | | | | | | | | |--- lead_time <= 68.50 | | | | | | | | | | |--- weights: [1.48, 0.00] class: 0 | | | | | | | | | |--- lead_time > 68.50 | | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | | |--- weights: [0.74, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- avg_price_per_room > 123.25 | | | | | | | | | |--- weights: [5.19, 0.00] class: 0 | | | | | | | |--- lead_time > 81.00 | | | | | | | | |--- lead_time <= 88.50 | | | | | | | | | |--- weights: [13.35, 0.00] class: 0 | | | | | | | | |--- lead_time > 88.50 | | | | | | | | | |--- weights: [0.74, 1.53] class: 1 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 117.50 | | | | | |--- arrival_month <= 10.50 | | | | | | |--- avg_price_per_room <= 92.72 | | | | | | | |--- avg_price_per_room <= 75.07 | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | |--- lead_time <= 98.50 | | | | | | | | | | |--- arrival_month <= 6.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 6.00 | | | | | | | | | | | |--- weights: [6.67, 0.00] class: 0 | | | | | | | | | |--- lead_time > 98.50 | | | | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | | | | |--- weights: [3.71, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | | |--- weights: [11.12, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | | | |--- weights: [0.74, 4.60] class: 1 | | | | | | | |--- avg_price_per_room > 75.07 | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | |--- avg_price_per_room <= 88.50 | | | | | | | | | | |--- weights: [51.17, 1.53] class: 0 | | | | | | | | | |--- avg_price_per_room > 88.50 | | | | | | | | | | |--- weights: [0.74, 1.53] class: 1 | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | | |--- weights: [0.00, 16.88] class: 1 | | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | |--- arrival_date <= 15.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 15.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | |--- avg_price_per_room > 92.72 | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- lead_time <= 110.50 | | | | | | | | | | |--- weights: [9.64, 9.21] class: 0 | | | | | | | | | |--- lead_time > 110.50 | | | | | | | | | | |--- weights: [3.71, 21.48] class: 1 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- avg_price_per_room <= 118.43 | | | | | | | | | | |--- avg_price_per_room <= 95.83 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 95.83 | | | | | | | | | | | |--- weights: [19.28, 1.53] class: 0 | | | | | | | | | |--- avg_price_per_room > 118.43 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [3.71, 0.00] class: 0 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [0.00, 6.14] class: 1 | | | | | | | |--- arrival_date > 11.50 | | | | | | | | |--- avg_price_per_room <= 108.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 121.23] class: 1 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [2.22, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 108.50 | | | | | | | | | |--- avg_price_per_room <= 109.50 | | | | | | | | | | |--- weights: [28.18, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 109.50 | | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | | |--- weights: [2.22, 0.00] class: 0 | | | | | |--- arrival_month > 10.50 | | | | | | |--- avg_price_per_room <= 185.50 | | | | | | | |--- weights: [40.79, 1.53] class: 0 | | | | | | |--- avg_price_per_room > 185.50 | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | |--- lead_time > 117.50 | | | | | |--- avg_price_per_room <= 89.88 | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | |--- no_of_weekend_nights <= 4.00 | | | | | | | | |--- avg_price_per_room <= 64.38 | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | |--- weights: [4.45, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- weights: [0.74, 4.60] class: 1 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- weights: [2.22, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 64.38 | | | | | | | | | |--- arrival_date <= 16.50 | | | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 16.50 | | | | | | | | | | |--- weights: [97.16, 1.53] class: 0 | | | | | | | |--- no_of_weekend_nights > 4.00 | | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | |--- arrival_date <= 10.50 | | | | | | | | |--- weights: [1.48, 0.00] class: 0 | | | | | | | |--- arrival_date > 10.50 | | | | | | | | |--- weights: [0.00, 3.07] class: 1 | | | | | |--- avg_price_per_room > 89.88 | | | | | | |--- arrival_date <= 7.50 | | | | | | | |--- avg_price_per_room <= 115.03 | | | | | | | | |--- weights: [53.40, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 115.03 | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | |--- lead_time <= 144.00 | | | | | | | | | | |--- weights: [5.19, 0.00] class: 0 | | | | | | | | | |--- lead_time > 144.00 | | | | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | |--- weights: [0.00, 3.07] class: 1 | | | | | | |--- arrival_date > 7.50 | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | |--- arrival_date <= 23.00 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 92.25 | | | | | | | | | | | |--- weights: [5.19, 4.60] class: 0 | | | | | | | | | | |--- avg_price_per_room > 92.25 | | | | | | | | | | | |--- weights: [12.61, 52.18] class: 1 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- lead_time <= 147.50 | | | | | | | | | | | |--- weights: [16.32, 3.07] class: 0 | | | | | | | | | | |--- lead_time > 147.50 | | | | | | | | | | | |--- weights: [0.74, 4.60] class: 1 | | | | | | | | |--- arrival_date > 23.00 | | | | | | | | | |--- weights: [0.00, 42.97] class: 1 | | | | | | | |--- arrival_date > 24.50 | | | | | | | | |--- lead_time <= 150.50 | | | | | | | | | |--- weights: [29.67, 4.60] class: 0 | | | | | | | | |--- lead_time > 150.50 | | | | | | | | | |--- weights: [0.00, 4.60] class: 1 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 9.50 | | | | |--- lead_time <= 3.50 | | | | | |--- avg_price_per_room <= 202.67 | | | | | | |--- arrival_month <= 5.50 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [42.27, 0.00] class: 0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 130.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 130.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- lead_time <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 2.50 | | | | | | | | | | | |--- weights: [0.00, 6.14] class: 1 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [10.38, 0.00] class: 0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- avg_price_per_room <= 80.00 | | | | | | | | | | |--- weights: [0.74, 24.55] class: 1 | | | | | | | | | |--- avg_price_per_room > 80.00 | | | | | | | | | | |--- avg_price_per_room <= 130.10 | | | | | | | | | | | |--- weights: [8.16, 1.53] class: 0 | | | | | | | | | | |--- avg_price_per_room > 130.10 | | | | | | | | | | | |--- weights: [0.00, 4.60] class: 1 | | | | | | |--- arrival_month > 5.50 | | | | | | | |--- avg_price_per_room <= 137.50 | | | | | | | | |--- no_of_week_nights <= 8.50 | | | | | | | | | |--- avg_price_per_room <= 118.90 | | | | | | | | | | |--- avg_price_per_room <= 118.04 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 118.04 | | | | | | | | | | | |--- weights: [0.00, 3.07] class: 1 | | | | | | | | | |--- avg_price_per_room > 118.90 | | | | | | | | | | |--- weights: [49.69, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 8.50 | | | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | | | | |--- avg_price_per_room > 137.50 | | | | | | | | |--- avg_price_per_room <= 139.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- weights: [0.00, 6.14] class: 1 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- arrival_date <= 12.50 | | | | | | | | | | | |--- weights: [2.22, 4.60] class: 1 | | | | | | | | | | |--- arrival_date > 12.50 | | | | | | | | | | | |--- weights: [5.93, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 139.50 | | | | | | | | | |--- lead_time <= 2.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | | | | |--- weights: [49.69, 3.07] class: 0 | | | | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 2.50 | | | | | | | | | | |--- avg_price_per_room <= 162.58 | | | | | | | | | | | |--- weights: [5.93, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 162.58 | | | | | | | | | | | |--- weights: [0.74, 4.60] class: 1 | | | | | |--- avg_price_per_room > 202.67 | | | | | | |--- weights: [0.00, 16.88] class: 1 | | | | |--- lead_time > 3.50 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- avg_price_per_room <= 99.17 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- weights: [20.77, 0.00] class: 0 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- weights: [8.16, 0.00] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- weights: [37.08, 29.16] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 85.28 | | | | | | | | | | | |--- weights: [0.00, 9.21] class: 1 | | | | | | | | | | |--- avg_price_per_room > 85.28 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | |--- avg_price_per_room > 99.17 | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | |--- weights: [21.51, 124.30] class: 1 | | | | | | | |--- arrival_date > 21.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | |--- avg_price_per_room <= 148.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 148.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- weights: [0.00, 12.28] class: 1 | | | | | |--- arrival_month > 8.50 | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | |--- avg_price_per_room <= 126.80 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- weights: [68.97, 1.53] class: 0 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- avg_price_per_room <= 95.10 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 95.10 | | | | | | | | | | | |--- weights: [6.67, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [14.09, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 126.80 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_date <= 18.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 18.50 | | | | | | | | | | |--- avg_price_per_room <= 208.22 | | | | | | | | | | | |--- weights: [8.16, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 208.22 | | | | | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [6.67, 0.00] class: 0 | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- weights: [2.22, 0.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- weights: [0.00, 9.21] class: 1 | | | |--- lead_time > 9.50 | | | | |--- avg_price_per_room <= 99.82 | | | | | |--- lead_time <= 26.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- lead_time <= 24.50 | | | | | | | | | |--- weights: [30.41, 0.00] class: 0 | | | | | | | | |--- lead_time > 24.50 | | | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [21.51, 3.07] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- avg_price_per_room <= 27.25 | | | | | | | | | | |--- weights: [4.45, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 27.25 | | | | | | | | | | |--- lead_time <= 13.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 13.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- lead_time <= 25.50 | | | | | | | | |--- weights: [60.07, 0.00] class: 0 | | | | | | | |--- lead_time > 25.50 | | | | | | | | |--- weights: [0.00, 4.60] class: 1 | | | | | |--- lead_time > 26.50 | | | | | | |--- avg_price_per_room <= 29.10 | | | | | | | |--- weights: [20.02, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 29.10 | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | |--- avg_price_per_room <= 72.83 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- avg_price_per_room <= 71.24 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 71.24 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- avg_price_per_room <= 70.62 | | | | | | | | | | | |--- weights: [10.38, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 70.62 | | | | | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | | | | | |--- avg_price_per_room > 72.83 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 80.05 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 80.05 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- weights: [2.22, 0.00] class: 0 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | |--- weights: [11.87, 0.00] class: 0 | | | | |--- avg_price_per_room > 99.82 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- arrival_year <= 2017.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- avg_price_per_room <= 102.56 | | | | | | | | | |--- weights: [0.74, 3.07] class: 1 | | | | | | | | |--- avg_price_per_room > 102.56 | | | | | | | | | |--- avg_price_per_room <= 115.88 | | | | | | | | | | |--- weights: [21.51, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 115.88 | | | | | | | | | | |--- avg_price_per_room <= 116.97 | | | | | | | | | | | |--- weights: [0.00, 3.07] class: 1 | | | | | | | | | | |--- avg_price_per_room > 116.97 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- weights: [0.74, 26.09] class: 1 | | | | | | |--- arrival_year > 2017.50 | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 195.30 | | | | | | | | | | |--- lead_time <= 48.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- lead_time > 48.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- avg_price_per_room > 195.30 | | | | | | | | | | |--- weights: [1.48, 144.25] class: 1 | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- weights: [0.00, 6.14] class: 1 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- weights: [3.71, 3.07] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- weights: [8.90, 3.07] class: 0 | | | | | | | |--- arrival_month > 10.50 | | | | | | | | |--- lead_time <= 23.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | | | |--- weights: [2.97, 16.88] class: 1 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [20.77, 0.00] class: 0 | | | | | | | | |--- lead_time > 23.50 | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | | |--- weights: [0.74, 13.81] class: 1 | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | |--- weights: [7.42, 85.94] class: 1 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- no_of_week_nights <= 9.00 | | | | | | | |--- avg_price_per_room <= 209.83 | | | | | | | | |--- weights: [36.34, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 209.83 | | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | | | |--- no_of_week_nights > 9.00 | | | | | | | |--- weights: [0.00, 1.53] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | |--- lead_time <= 91.50 | | | | | | | |--- weights: [664.51, 1.53] class: 0 | | | | | | |--- lead_time > 91.50 | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | |--- arrival_date <= 9.00 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- weights: [0.00, 6.14] class: 1 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | | | |--- weights: [3.71, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | | | | | |--- arrival_date > 9.00 | | | | | | | | | |--- lead_time <= 142.50 | | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | | | | | | |--- lead_time > 142.50 | | | | | | | | | | |--- weights: [1.48, 3.07] class: 1 | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | |--- weights: [71.20, 3.07] class: 0 | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | |--- weights: [0.00, 3.07] class: 1 | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | |--- lead_time <= 60.50 | | | | | | |--- weights: [13.35, 1.53] class: 0 | | | | | |--- lead_time > 60.50 | | | | | | |--- weights: [0.74, 7.67] class: 1 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 9.50 | | | | | |--- lead_time <= 4.50 | | | | | | |--- avg_price_per_room <= 157.93 | | | | | | | |--- no_of_weekend_nights <= 5.00 | | | | | | | | |--- weights: [452.40, 19.95] class: 0 | | | | | | | |--- no_of_weekend_nights > 5.00 | | | | | | | | |--- weights: [0.74, 1.53] class: 1 | | | | | | |--- avg_price_per_room > 157.93 | | | | | | | |--- avg_price_per_room <= 160.25 | | | | | | | | |--- arrival_date <= 16.00 | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | | | | | | |--- arrival_date > 16.00 | | | | | | | | | |--- weights: [0.74, 4.60] class: 1 | | | | | | | |--- avg_price_per_room > 160.25 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- weights: [37.08, 1.53] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | | |--- weights: [2.22, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | | |--- weights: [1.48, 6.14] class: 1 | | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- weights: [13.35, 0.00] class: 0 | | | | | |--- lead_time > 4.50 | | | | | | |--- avg_price_per_room <= 123.50 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- arrival_date <= 12.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- weights: [10.38, 0.00] class: 0 | | | | | | | | |--- arrival_date > 12.50 | | | | | | | | | |--- weights: [79.36, 9.21] class: 0 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- weights: [77.13, 1.53] class: 0 | | | | | | |--- avg_price_per_room > 123.50 | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | |--- avg_price_per_room <= 146.42 | | | | | | | | | | |--- weights: [0.74, 3.07] class: 1 | | | | | | | | | |--- avg_price_per_room > 146.42 | | | | | | | | | | |--- weights: [2.22, 0.00] class: 0 | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | |--- weights: [27.44, 0.00] class: 0 | | | | | | | |--- arrival_month > 5.50 | | | | | | | | |--- avg_price_per_room <= 129.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [2.97, 10.74] class: 1 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- arrival_month <= 7.00 | | | | | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | | | | | | | |--- arrival_month > 7.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- avg_price_per_room > 129.50 | | | | | | | | | |--- avg_price_per_room <= 139.57 | | | | | | | | | | |--- weights: [18.54, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 139.57 | | | | | | | | | | |--- avg_price_per_room <= 167.75 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 167.75 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | |--- lead_time > 9.50 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- avg_price_per_room <= 118.30 | | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | | |--- avg_price_per_room <= 67.36 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- lead_time <= 100.50 | | | | | | | | | | | |--- weights: [12.61, 1.53] class: 0 | | | | | | | | | | |--- lead_time > 100.50 | | | | | | | | | | | |--- weights: [1.48, 3.07] class: 1 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- weights: [100.12, 3.07] class: 0 | | | | | | | | |--- avg_price_per_room > 67.36 | | | | | | | | | |--- lead_time <= 67.50 | | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 19 | | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | | |--- weights: [120.89, 0.00] class: 0 | | | | | | | | | |--- lead_time > 67.50 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | | |--- lead_time <= 100.50 | | | | | | | | | |--- weights: [2.22, 30.69] class: 1 | | | | | | | | |--- lead_time > 100.50 | | | | | | | | | |--- weights: [4.45, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 118.30 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | | |--- weights: [5.93, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | | |--- arrival_date <= 20.00 | | | | | | | | | | | |--- weights: [0.74, 6.14] class: 1 | | | | | | | | | | |--- arrival_date > 20.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | |--- weights: [39.31, 0.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- lead_time <= 18.50 | | | | | | | | | | | |--- weights: [10.38, 1.53] class: 0 | | | | | | | | | | |--- lead_time > 18.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- no_of_week_nights <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | | |--- no_of_week_nights > 7.50 | | | | | | | | | | | |--- weights: [0.00, 7.67] class: 1 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- lead_time <= 100.00 | | | | | | | | | | | |--- weights: [53.40, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 100.00 | | | | | | | | | | | |--- weights: [1.48, 18.41] class: 1 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- room_type_reserved_Room_Type 7 <= 0.50 | | | | | | | |--- lead_time <= 150.00 | | | | | | | | |--- weights: [123.11, 0.00] class: 0 | | | | | | | |--- lead_time > 150.00 | | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | | | |--- room_type_reserved_Room_Type 7 > 0.50 | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 89.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [1572.28, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | |--- lead_time <= 6.50 | | | | | | | | |--- weights: [29.67, 0.00] class: 0 | | | | | | | |--- lead_time > 6.50 | | | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | | | |--- weights: [11.87, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [12.61, 0.00] class: 0 | | | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | | | |--- weights: [26.70, 0.00] class: 0 | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | |--- weights: [66.75, 3.07] class: 0 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- weights: [0.74, 7.67] class: 1 | | | |--- lead_time > 89.50 | | | | |--- avg_price_per_room <= 202.14 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | |--- weights: [1.48, 10.74] class: 1 | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | | | |--- weights: [5.93, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 14.00 | | | | | | | | | | |--- weights: [1.48, 4.60] class: 1 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- lead_time <= 150.50 | | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | | | | |--- weights: [6.67, 1.53] class: 0 | | | | | | | | | | |--- arrival_date > 1.50 | | | | | | | | | | | |--- weights: [1.48, 4.60] class: 1 | | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | | |--- lead_time <= 141.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 141.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- lead_time > 150.50 | | | | | | | | | |--- weights: [0.00, 6.14] class: 1 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- avg_price_per_room <= 140.49 | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | |--- weights: [0.00, 12.28] class: 1 | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 74.08 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 74.08 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- weights: [2.97, 9.21] class: 1 | | | | | | | |--- avg_price_per_room > 140.49 | | | | | | | | |--- weights: [25.22, 7.67] class: 0 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [59.33, 0.00] class: 0 | | | | |--- avg_price_per_room > 202.14 | | | | | |--- weights: [0.00, 13.81] class: 1 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests <= 0.50 | | | |--- no_of_adults <= 1.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 163.50 | | | | | | |--- arrival_date <= 7.00 | | | | | | | |--- weights: [0.00, 21.48] class: 1 | | | | | | |--- arrival_date > 7.00 | | | | | | | |--- weights: [5.19, 0.00] class: 0 | | | | | |--- lead_time > 163.50 | | | | | | |--- lead_time <= 341.00 | | | | | | | |--- lead_time <= 173.00 | | | | | | | | |--- avg_price_per_room <= 97.50 | | | | | | | | | |--- avg_price_per_room <= 70.85 | | | | | | | | | | |--- weights: [1.48, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 70.85 | | | | | | | | | | |--- weights: [0.00, 16.88] class: 1 | | | | | | | | |--- avg_price_per_room > 97.50 | | | | | | | | | |--- weights: [44.50, 6.14] class: 0 | | | | | | | |--- lead_time > 173.00 | | | | | | | | |--- avg_price_per_room <= 98.00 | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | | |--- weights: [0.00, 3.07] class: 1 | | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | | |--- weights: [7.42, 1.53] class: 0 | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | |--- weights: [182.44, 3.07] class: 0 | | | | | | | | |--- avg_price_per_room > 98.00 | | | | | | | | | |--- weights: [0.74, 7.67] class: 1 | | | | | | |--- lead_time > 341.00 | | | | | | | |--- avg_price_per_room <= 75.00 | | | | | | | | |--- weights: [3.71, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 75.00 | | | | | | | | |--- avg_price_per_room <= 88.00 | | | | | | | | | |--- weights: [0.00, 9.21] class: 1 | | | | | | | | |--- avg_price_per_room > 88.00 | | | | | | | | | |--- weights: [8.90, 12.28] class: 1 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- avg_price_per_room <= 2.50 | | | | | | |--- lead_time <= 285.50 | | | | | | | |--- weights: [7.42, 1.53] class: 0 | | | | | | |--- lead_time > 285.50 | | | | | | | |--- weights: [0.74, 4.60] class: 1 | | | | | |--- avg_price_per_room > 2.50 | | | | | | |--- weights: [1.48, 85.94] class: 1 | | | |--- no_of_adults > 1.50 | | | | |--- avg_price_per_room <= 82.47 | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | |--- weights: [3.71, 319.19] class: 1 | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- lead_time <= 244.00 | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | |--- lead_time <= 165.00 | | | | | | | | | | |--- weights: [12.61, 1.53] class: 0 | | | | | | | | | |--- lead_time > 165.00 | | | | | | | | | | |--- arrival_date <= 19.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 19.00 | | | | | | | | | | | |--- weights: [2.22, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | |--- lead_time <= 178.50 | | | | | | | | | | |--- lead_time <= 170.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 170.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 178.50 | | | | | | | | | | |--- avg_price_per_room <= 27.77 | | | | | | | | | | | |--- weights: [0.00, 3.07] class: 1 | | | | | | | | | | |--- avg_price_per_room > 27.77 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- lead_time > 244.00 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [28.92, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- avg_price_per_room <= 76.00 | | | | | | | | | | | |--- weights: [5.19, 247.07] class: 1 | | | | | | | | | | |--- avg_price_per_room > 76.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [8.90, 0.00] class: 0 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [43.02, 0.00] class: 0 | | | | |--- avg_price_per_room > 82.47 | | | | | |--- no_of_adults <= 2.50 | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | |--- lead_time <= 324.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- weights: [8.16, 954.51] class: 1 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | |--- weights: [0.00, 12.28] class: 1 | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | |--- weights: [3.71, 0.00] class: 0 | | | | | | | |--- lead_time > 324.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- weights: [1.48, 12.28] class: 1 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- weights: [2.22, 0.00] class: 0 | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | |--- weights: [0.00, 16.88] class: 1 | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | |--- weights: [4.45, 0.00] class: 0 | | | | | |--- no_of_adults > 2.50 | | | | | | |--- weights: [7.42, 0.00] class: 0 | | |--- no_of_special_requests > 0.50 | | | |--- no_of_weekend_nights <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- lead_time <= 158.50 | | | | | | |--- arrival_month <= 9.00 | | | | | | | |--- avg_price_per_room <= 98.81 | | | | | | | | |--- weights: [5.93, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 98.81 | | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | | | |--- arrival_month > 9.00 | | | | | | | |--- weights: [0.74, 7.67] class: 1 | | | | | |--- lead_time > 158.50 | | | | | | |--- arrival_date <= 2.00 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- weights: [0.00, 3.07] class: 1 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- weights: [2.22, 0.00] class: 0 | | | | | | |--- arrival_date > 2.00 | | | | | | | |--- weights: [39.31, 1.53] class: 0 | | | | |--- lead_time > 180.50 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | |--- weights: [1.48, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | |--- weights: [0.00, 178.01] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- lead_time <= 300.00 | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | |--- weights: [8.16, 6.14] class: 0 | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | |--- weights: [0.00, 4.60] class: 1 | | | | | | | | |--- lead_time > 300.00 | | | | | | | | | |--- weights: [0.00, 9.21] class: 1 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | |--- weights: [13.35, 1.53] class: 0 | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [11.87, 0.00] class: 0 | | | |--- no_of_weekend_nights > 0.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- weights: [120.15, 3.07] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- no_of_week_nights <= 9.50 | | | | | | | |--- avg_price_per_room <= 76.48 | | | | | | | | |--- arrival_date <= 27.00 | | | | | | | | | |--- weights: [40.05, 1.53] class: 0 | | | | | | | | |--- arrival_date > 27.00 | | | | | | | | | |--- lead_time <= 207.50 | | | | | | | | | | |--- weights: [3.71, 0.00] class: 0 | | | | | | | | | |--- lead_time > 207.50 | | | | | | | | | | |--- arrival_date <= 28.50 | | | | | | | | | | | |--- weights: [0.00, 3.07] class: 1 | | | | | | | | | | |--- arrival_date > 28.50 | | | | | | | | | | | |--- weights: [1.48, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 76.48 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | | |--- weights: [1.48, 4.60] class: 1 | | | | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | | | | |--- weights: [9.64, 0.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- avg_price_per_room <= 80.42 | | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | | |--- weights: [1.48, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | | |--- weights: [0.00, 9.21] class: 1 | | | | | | | | | |--- avg_price_per_room > 80.42 | | | | | | | | | | |--- avg_price_per_room <= 93.58 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 93.58 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- no_of_week_nights > 9.50 | | | | | | | |--- weights: [0.00, 7.67] class: 1 | | | | | |--- arrival_month > 11.50 | | | | | | |--- avg_price_per_room <= 55.92 | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 55.92 | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | |--- avg_price_per_room <= 80.19 | | | | | | | | | | |--- lead_time <= 195.50 | | | | | | | | | | | |--- weights: [1.48, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 195.50 | | | | | | | | | | | |--- weights: [4.45, 23.02] class: 1 | | | | | | | | | |--- avg_price_per_room > 80.19 | | | | | | | | | | |--- weights: [5.19, 3.07] class: 0 | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | |--- weights: [0.00, 7.67] class: 1 | | | | | | | |--- no_of_children > 0.50 | | | | | | | | |--- weights: [2.22, 0.00] class: 0 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests <= 2.50 | | | | |--- weights: [0.00, 3208.81] class: 1 | | | |--- no_of_special_requests > 2.50 | | | | |--- weights: [23.73, 0.00] class: 0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- weights: [43.76, 0.00] class: 0 | | | |--- no_of_special_requests > 0.50 | | | | |--- arrival_date <= 24.50 | | | | | |--- weights: [3.71, 0.00] class: 0 | | | | |--- arrival_date > 24.50 | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | |--- avg_price_per_room <= 119.00 | | | | | | | |--- weights: [0.74, 9.21] class: 1 | | | | | | |--- avg_price_per_room > 119.00 | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | |--- weights: [3.71, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | |--- weights: [0.00, 1.53] class: 1 | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | |--- weights: [0.00, 10.74] class: 1
# importance of features in the tree building ( The importance of a feature is computed as the
# (normalized) total reduction of the 'criterion' brought by that feature. It is also known as the Gini importance )
print(
pd.DataFrame(
best_model.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.369777 avg_price_per_room 0.139793 market_segment_type_Online 0.114696 no_of_special_requests 0.105753 arrival_month 0.069031 arrival_date 0.061318 no_of_week_nights 0.031124 no_of_weekend_nights 0.027429 no_of_adults 0.024574 arrival_year 0.015308 market_segment_type_Offline 0.011522 required_car_parking_space 0.008456 room_type_reserved_Room_Type 4 0.006509 type_of_meal_plan_Not Selected 0.004302 type_of_meal_plan_Meal Plan 2 0.003173 no_of_children 0.002990 room_type_reserved_Room_Type 2 0.001855 room_type_reserved_Room_Type 5 0.000653 room_type_reserved_Room_Type 7 0.000630 market_segment_type_Corporate 0.000495 repeated_guest 0.000313 room_type_reserved_Room_Type 6 0.000299 no_of_previous_bookings_not_canceled 0.000000 no_of_previous_cancellations 0.000000 room_type_reserved_Room_Type 3 0.000000 market_segment_type_Complementary 0.000000 type_of_meal_plan_Meal Plan 3 0.000000
importances = best_model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
Comparison of Models and Final Model Selection¶
# training performance comparison
models_train_comp_df = pd.concat(
[
decision_tree_perf_train.T,
decision_tree_tune_perf_train.T,
decision_tree_postpruned_perf_train.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Decision Tree sklearn",
"Decision Tree (Pre-Pruning)",
"Decision Tree (Post-Pruning)",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Decision Tree sklearn | Decision Tree (Pre-Pruning) | Decision Tree (Post-Pruning) | |
|---|---|---|---|
| Accuracy | 0.994109 | 0.838137 | 0.941525 |
| Recall | 0.985924 | 0.781337 | 0.953404 |
| Precision | 0.995955 | 0.737487 | 0.877681 |
| F1 | 0.990914 | 0.758779 | 0.913977 |
# test performance comparison
models_train_comp_df = pd.concat(
[
decision_tree_perf_test.T,
decision_tree_tune_perf_test.T,
decision_tree_postpruned_perf_test.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Decision Tree sklearn",
"Decision Tree (Pre-Pruning)",
"Decision Tree (Post-Pruning)",
]
print("Test set performance comparison:")
models_train_comp_df
Test set performance comparison:
| Decision Tree sklearn | Decision Tree (Pre-Pruning) | Decision Tree (Post-Pruning) | |
|---|---|---|---|
| Accuracy | 0.869926 | 0.831827 | 0.871771 |
| Recall | 0.799667 | 0.779105 | 0.846624 |
| Precision | 0.806840 | 0.731733 | 0.784299 |
| F1 | 0.803238 | 0.754676 | 0.814270 |
Actionable Insights and Recommendations for INN Hotels¶
Cancellation and Refund Policy Optimizationon¶
1. Introduce Non-Refundable Booking Options¶
- Guests booking early (
lead_time) and throughOnlinechannels are more likely to cancel. - Offer discounted non-refundable rates to secure revenue upfront and reduce cancellation risk.
2. Flexible Refunds for Short Lead Time Bookings¶
- Guests with shorter
lead_timehave lower cancellation probability. - Allow full or partial refunds for bookings made less than 15 days i
3. Monitor and Encourage Special Requests¶
- Guests with 0–2 special requests show a higher likelihood of cancellation, especially those with zero requests (over 40% cancellation rate).
- Conversely, guests with 3 or more requests almost never cancel.
- Recommendation:
- Encourage guests to submit special requests during booking (e.g., preferences for room setup, amenities).
- Use the number of requests as a positive signal of booking commitment rather than a cancellation risk.
- Avoid penalizing or flagging high-request bookings—they are highly likely to show up.iple
Other Strategic Recommendationsic Recommendations¶
4. Optimize Online Booking Channels¶
market_segment_type_Onlineis a strong predictor of cancellations.- Encourage direct bookings by:
- Offering exclusive perks or discounts via the hotel’s website.
- Displaying clear cancellation policies during checkout.
5. Dynamic Pricing for High-Risk Segments¶
- Long
lead_time, noparking, and lowavg_price_per_roomcorrelate with higher cancellation risk. - Raise prices or require partial prepayment for bookings matching this profile.
6. Targeted Offers for Reliable Segments¶
- Guests with short
lead_timeand fewer requests are more likely to show up. - Incentivize these custo
Model Insights¶
Pre-Pruned Decision Tree performed best: F1 train - 0.758779, F1 test - 0.754676
Post-Pruned Decision Tree model looks overfitted: F1 train - 0.913977, F1 test - 0.814270
Top Features Contributing to Predictions:
lead_timemarket_segment_type_Onlineno_of_special_requestsavg_price_per_roomshould inform policy design and marketing strategy.